29 May 2013

Multi-tenanting Ruby on Rails Applications on Heroku - Part IV: using the gem Milia

Milia tutorial
This is the final article in the four-part series on Multi-Tenanting RoR apps on Heroku.
Milia, which means "stripe" in Swahili, is the name of the row-based multi-tenanting gem I developed.
Aside: the project name, for the Rails 3.1 version of my app, is punda (small horse/donkey in Swahili), so the combination would be: punda milia (which is zebra in Swahili) .. cute.
My app's environment might be similar to others on Heroku: using Devise for user authentication, using DelayedJob for background task processing (emails, etc), and using Postgres. I also use the cedar stack.
So, fasten your seat belts and let's get started.

[update Jan 2014]
A newer version of Milia v1.0.0 is now available. It supports Rails 4.0.x and Devise 3.1.x. There is now a working sample application that can be generated for checking out milia. This post contains older information parts of which have updated information on the README.

Basic concepts
  • All user authentication must also determine the current tenant.
  • Every user belongs to at least one tenant.
  • New account sign ups create both a new tenant and the first user within that tenant.
  • No controller actions are permitted without a current tenant and the current user must be valid within that tenant. (Except, of course, for sign up and sign in.)
  • All tenanted model DB operations (CRUD actions) must be constrained to the current tenant. This includes associations, joins, etc.
  • The tenanting enforcement for DB operations should be as transparent as possible to any individual CRUD action.
  • Background task execution must take place within the context of the tenant appropriate to the queued task itself.
  • Migrations must be able to function correctly in a multi-tenanted realm.
  • Admin tools must be present for when an admin works in console mode.
  • Rake tasks are required to function correctly in multi-tenanted realm.
  • The tenant_id field within any record must not be alterable by user input.
  • milia assumes that the current running instance of the http response process is a singleton thread; non-reentreant by any other process.
  • Milia uses Thread.current[:tenant_id] to hold the current tenant for the existing action request in the application.
  • milia enforces a default_scope for each model (Danger Will Robinson: Rails only uses the last defined default_scope! Thus an application using milia cannot also use default_scope!)

  • necessary models: user, tenant
  • necessary migrations: user, tenant, tenants_users (join table)
Dependency requirements
  • Rails 3.1 or higher
  • Devise 1.4.8 or higher
In the Gemfile:
  gem 'milia'
Getting started
Rails setup
Milia expects a user session, so please set one up
$ rails g session_migration invoke active_record create db/migrate/20111012060818_add_sessions_table.rb
Devise setup
  • See https://github.com/plataformatec/devise for how to set up devise.
  • The current version of milia requires that devise use a *User* model.
Milia setup
*ALL* models require a tenanting field, whether they are to be universal or to be tenanted. So make sure the following is added to each migration
  t.references :tenant
Tenanted models will also require indexes for the tenant field:
  add_index :TABLE, :tenant_id
Also create a tenants_users join table:
  class CreateTenantsUsers < ActiveRecord::Migration
    def change
      create_table :tenants_users, :id => false  do |t|
        t.references   :tenant
        t.references   :user
      add_index :tenants_users, :tenant_id
      add_index :tenants_users, :user_id
application controller
add the following line AFTER the devise-required filter for authentications:
  before_filter :authenticate_tenant!   # authenticate user and setup tenant

# ------------------------------------------------------------------------------
# authenticate_tenant! -- authorization & tenant setup
# -- authenticates user
# -- sets current tenant
# -- sets up app environment for this user
# ------------------------------------------------------------------------------
  def authenticate_tenant!()

    unless authenticate_user!
      email = ( params.nil? || params[:user].nil?  ?  ""  : " as: " + params[:user][:email] )

      flash[:notice] = "cannot sign you in#{email}; check email/password and try again"
      return false  # abort the before_filter chain

    # user_signed_in? == true also means current_user returns valid user
    raise SecurityError,"*** invalid sign-in  ***" unless user_signed_in?

    set_current_tenant   # relies on current_user being non-nil
    # any application-specific environment set up goes here
    true  # allows before filter chain to continue

catch any exceptions with the following (be sure to also add the designated methods!)
  rescue_from ::Milia::Control::MaxTenantExceeded, :with => :max_tenants
  rescue_from ::Milia::Control::InvalidTenantAccess, :with => :invalid_tenant
Add the following line into the devise_for :users block
  devise_for :users do
    post  "users" => "milia/registrations#create"
Designate which model determines account
Add the following acts_as_... to designate which model will be used as the key into tenants_users to find the tenant for a given user. Only designate one model in this manner.
  class User < ActiveRecord::Base
  end  # class User
Designate which model determines tenant
Add the following acts_as_... to designate which model will be used as the tenant model. It is this id field which designates the tenant for an entire group of users which exist within a single tenanted domain. Only designate one model in this manner.
  class Tenant < ActiveRecord::Base
  end  # class Tenant
Designate universal models
Add the following acts_as_universal to *ALL* models which are to be universal and remove any superfluous
  belongs_to  :tenant
which the generator might have generated ( acts_as_tenant will specify that ).
Example for a model called Eula:
  class Eula < ActiveRecord::Base
  end  # class Eula
Designate tenanted models
Add the following acts_as_tenant to *ALL* models which are to be tenanted and remove any superfluous
  belongs_to  :tenant
which the generator might have generated ( acts_as_tenant will specify that ).
Example for a tenanted model called Post:
  class Post < ActiveRecord::Base
  end  # class Post
Exceptions raised
Tenant pre-processing hooks
Milia expects a tenant pre-processing & setup hook within the designated Tenant model. Example of method invocation:
  Tenant.create_new_tenant(params)   # see sample code below
where the sign-up params are passed, the new tenant must be validated, created, and then returned. Any other kinds of prepatory processing are permitted here, but should be minimal, and should not involve any tenanted models. At this point in the new account sign-up chain, no tenant has been set up yet (but will be immediately after the new tenant has been created).
Example of expected minimum for create_new_tenant:
  def self.create_new_tenant(params)
    tenant # Tenant.new(:cname => params[:user][:email], :company => params[:tenant][:company])

    if new_signups_not_permitted?(params)
      raise ::Milia::Control::MaxTenantExceeded, "Sorry, new accounts not permitted at this time" 
      tenant.save    # create the tenant
    return tenant
Milia expects a tenant post-processing hook within the model Tenant:
  Tenant.tenant_signup(user,tenant,other)   # see sample code below
The purpose here is to do any tenant initialization AFTER devise has validated and created a user. Objects for the user and tenant are passed. It is recommended that only minimal processing be done here ... for example, queueing a background task to do the actual work in setting things up for a new tenant.
# ------------------------------------------------------------------------
# tenant_signup -- setup a new tenant in the system
# CALLBACK from devise RegistrationsController (milia override)
# AFTER user creation and current_tenant established
# args:
#   user  -- new user  obj
#   tenant -- new tenant obj
#   other  -- any other parameter string from initial request
# ------------------------------------------------------------------------
  def self.tenant_signup(user, tenant, other = nil)
    StartupJob.queue_startup( tenant, user, other )
Alternate use case: user belongs to multiple tenants
Your application might allow a user to belong to multiple tenants. You will need to provide some type of mechanism to allow the user to choose which account (thus tenant) they wish to access. Once chosen, in your controller, you will need to put:
  set_current_tenant( new_tenant_id )
joins might require additional tenanting restrictions
Subordinate join tables will not get the Rails default scope. Theoretically, the default scope on the master table alone should be sufficient in restricting answers to the current_tenant alone .. HOWEVER, it doesn't feel right.
BUT If the master table for the join is a universal table, then you really MUST use the following workaround, otherwise the database will access data in other tenanted areas even if no records are returned. This is a potential security breach. Further details can be found in various discussions about the behavior of databases such as POSTGRES.
The milia workaround is to add an additional which invokes a milia method to generate the SQL necessary to constrain the tenants for the given classes.
     .where( where_restrict_tenants(klass1, klass2,...))
for each of the subordinate models in the join.
usage of where_restrict_tenants
    Comment.joins(stuff).where( where_restrict_tenants(Post, Author) ).all
Note that even the console ($ rails console) will be run in multi-tenanting mode. You will need to establish a current_user and setup the current_tenant, otherwise most Model DB accesses will fail.
For the author's own application, I have set up a small ruby file which I load when I start the console. This does the following:
    def change_tenant(my_id,my_tenant_id)
      @me = User.find( my_id )
      @w  = Tenant.find( my_tenant_id )
      Tenant.set_current_tenant @w

change_tenant(1,1)   # or whatever is an appropriate starting user, tenant
  • Milia designates a default_scope for all models (both universal and tenanted). From Rails 3.2 onwards, the last designated default scope overrides any prior scopes and will invalidate multi-tenanting; so *DO NOT USE default_scope*
  • SQL statements executed outside the context of ActiveRecord pose a potential danger; the current milia implementation does not extend to the DB connection level and so cannot enforce tenanting at this point.
  • The tenant_id of a universal model will always be forced to nil.
  • The tenant_id of a tenanted model will be set to the current_tenant of the current_user upon creation.

Multi-tenanting Ruby on Rails Applications on Heroku - Part III

Schema-based vs Row-based Methods
My last two articles have introduced the topic and outlined common multi-tenanting methods. This article will examine the pros & cons of both methods especially in regards to apps using Postgres on Heroku. The content for most of this article came from an email exchange with Daniel (a Postgres wizard genius) at Heroku. I was migrating an existing bamboo-stack Rails 2.1 app to cedar-stack and Rails 3.1. I had already expected that I would have to change my method of multi-tenanting. I originally wrote a schema-based multi-tenanting for my Rails 2.1 app, which involved monkey-patching ActiveRecord and embedded in a record's id, both the tenant number as well as the row id number.
The email exchange has been edited but does leave in various tangents raised in our discussion of multi-tenanting methods. In the end, I decided to re-create my multi-tenanting using a row-based methodology, and turned it into a gem, called milia (Swahili for stripe).

Question: pgrestore is excruciatingly sloooooow. I am starting a new staging area on cedar stack for my production app. So I did a pgrestore from the production app DB (1.4MB) to the staging app; I started it 3 hours (yes, HOURS) ago, and it still hasn't completed the restore.
Thanks for looking into it. Currently, the way that I do multi-tenanting is by tablename, not rows. So my DB structure is wide and shallow not narrow & deep. Example: books__w1, books__w2, etc. I'm running in beta now, so have 9 tenants x 62 tables = almost 560 tables + 600 indexes, and 560+ schemas. I wonder if that's the reason for the slowness. I'm considering a move to do the multi-tenanting by postgres SCHEMA, and if PG is inefficient in handle a plethora of tables, that would be a great incentive to push the change. What do you think?
We have seen the scalability limits of pg_dump in particular (the postgres server was still fine) when there were tens or hundreds of thousands of database objects (tables, sequences, et al). It also tends to make the backups take a very long time, and now that you mention the sheer number of things that seems like a totally plausible contributing cause. The planner also becomes more expensive in resolving identifiers when there are so many symbols -- one can amortize this by using prepared statements to avoid re-running the planner.
The main advantages of schema-oriented multi-tenancy:
  • Each tenant can be cleanly lifted out of the database by dumping their schema and moved around
  • Unlike row-based multi-tenancy, you do not repeat tenant information all the time in every record, so overall compactness is greater
  • Locality is often better, since data for one customer that is likely to be accessed frequently is adjacent physically. copying out a single tenant for analysis can be much, much faster.
  • More fine-grained locking: a double-edged sword, but typically for the best; lock contention is often reduced when using more schema objects that do not have contending queries. This affects indexes in particular.
The downsides:
  • The planner becomes more expensive when there are huge number of identifiers to pick through.
  • Some tools (like \dt in psql, or pg_dump) can have a hard time coping with, say, millions of schema objects, whereas millions of rows is an extremely ubiquitous use-case.
Generally I prefer row-based multi-tenancy because it's much more common and tools are designed around handling a lot of records in a table.
Schema-based multi-tenancy is still compelling because it is in some ways very convenient (just load the application in a different schema context, use pg_dump with a schema name to back up exactly one customer, upgrade customers in a rolling-fashion independently), and really *ought* to work well, but is less well-trodden.
I think both can be made to work, but when doing schema-based multi-tenancy you will have to know a lot more abou the vagaries of a database's implementation rather than thinking about the the logic of the problem once you cross a certain number of schemas. However, as above, it is not without advantages, too.
Thank you for this excellent response; it is exactly the kind of information, comparision discussion for which I've been searching unsuccessfully. It is worthy of being an article in Heroko docs!
A couple of follow-up clarifying questions:
1. The postgres planner variables seem fairly complex .. but it doesn't appear that's what you're suggesting. You say "one can amortize this by using prepared statements to avoid re-running the planner." .. I am not familiar with how to do that .. are you implying to use a VIEW for that (which I have used for complex queries).
2. Wouldn't use of the Postgres SCHEMA space capability be efficient? http://www.postgresql.org/docs/8.4/static/ddl-schemas.html I could set the schema_path to just be the tenant currently in session, then it shouldn't be any more expensive than using the default public schema for the 60 tables I have, rather than the current method I've been employing of putting a suffix onto the tablename. This method involved monkey-patching ActiveRecord and I don't think it will work in the Rails 3.x way of doing things. I've been worried about scaling my app after I end beta .. what happens with a 1000 tenants, and 100,000 tables/indexes/etc. It sounds like you're agreeing with that. But how will a schema_space-based tenancy compare?
3. I originally considered many of the pros/cons you mentioned. It seems to me the row-based tenancy is less secure unless it uses an underlying DBMS failsafe mechanism of some type. Whereas, the schema-space (or tablename-based method which I currently employ) both locks the tenant code into the record id, and only gives the DB a query request based on a single tenant's space, potentially limiting damage in case of an error or intrusion.
A demonstration of prepared statements: Here I prepare a statement, causing it to be planned and the plan to be bound to a name:
  fdr=# PREPARE find_post_by_name AS SELECT * FROM posts WHERE id = $1; PREPARE
Later, I execute it:
  fdr=# EXECUTE find_post_by_name(1);

  id | user_id | contents
   1 |       4 | This is a post about movies
  (1 row)
I think support for this in an ORM layer has been added in Rails 3.1, which is super-handy. There does seem to be a bug with the way it supports schema (precisely because it does *not* re-resolve identifiers); this user on this mailing list is doing something that sounds a lot like what you are:
In general, people do not stress-test postgres as often with a truly huge number of schema objects. There have definitely been missing indexes on certain catalog tables in the past when people have tried (these usually do get added in the next release), but even then, there's only so much one can do: larger data structures just require picking though larger caches and more memory. In the case with a very large catalog, the pace of resolving an identifier is constrained by O(log(n)) to poke through the index, just like what is the case with finding tuples.
Also, dumps and restores of the entire database will be slow, in part because underlying file systems are just not that fast at dealing with hundreds of thousands of files. These weaknesses are not really helped nor hurt by the existence of schemas.
I do think schema-based multi-tenancy is probably preferable to name mangling, should you decide to use some kind of schema-object partitioning for users. The caveat here is that SQL doesn't support recursive namespacing: one level of schema is all one gets, there are no nested schemas, so if you want to use schemas for another purpose you are out of luck: one is back to name mangling.
I would seriously look at patching your A/R (again...) or using subclassing or something to always apply a WHERE clause to *every* query issued against the database, making it impossible for most of the program to even consider data that doesn't belong to the relevant tenant. A convention for tagging every row with tenant information would accompany this approach. Lots of paranoid assertions at multiple levels may also be advised.
There are numerous other tricks, but at the end of the day, I'd look at the general topic of "row level security". There may even exist attempts at solving this problem, although I am not privy to how mature they are.
Thank you so much for the informative responses; I appreciate your expertise with Postgres. I noticed in the Rails postgres connection adapter several methods dealing with schema, so I will be exploring using them. I don't think the nested schema will be an issue. I think O(log(n)) .. similar to index searches, is reasonable.
My current method uses a singleton to control the multi-tenant nature of the app; other than placing an "acts_as_tenant" in the models, every other CRUD usage looks the same as any other Rails app.
When I setup my internal environment at start of each session, I set the tenant for the given user. acts_as_tenant essentially redefines AR::Base#table_name, to mangle the table_name based on given tenant.
In AR 2.3.x, I intercept at the beginning of find: any specified IDs in a select query are validated against allowable tenants for the user. The IDs used are 64-bits, where the upper 32-bits are the tenant_id, and the lower 32 are the actual record id. When a table is created, the min/max & sequence ID for that table in that tenant is established.
If I re-do, I think I'd like to be more low-level, in the postgres connection, itself. That would allow the IDs to be verified, etc.
My original thought, to use schema_path, won't work: it's not reentreant (the Postgres DB is a single instance .. I might have numerous dynos accessing it simultaneously). But PG does allow schema_name.table_name designations .. and the rails pg connection appears to support that as well. the trick will be to make it transparent to the app itself.
I don't like the existing multi-tenant gems as they are too intrusive in the code. What happens if a programmer forgets to wrap a DB access with the code/control blocks those gems require? that's a bug I don't want to search for.
My recollection is that search_path is a per-backend setting; when one wants to switch tenants, your code can rebind the search path, paying mind to the existing caveat (really a bug) in Rails 3.1, if that mailing list post is correct. If you have threads sharing a connection then that does get messy (but then again, so do many features, including transactions), and qualifying names is a much better idea. I'm not sure if I follow what you mean "the Postgres DB is a single instance". In many ways, outside the data itself, each backend/connection is an "instance," and can have many of their own settings.
The big risk in having huge catalogs is that if there is a case Postgres upstream did not foresee then nominally you will be stuck waiting one whole release cycle for the index addition to the catalog. Plus, you have to upgrade to get that index. Basically, you lose the ability to put out your own fires, and you will have to be on a more stringent upgrade treadmill for Postgres in event of an issue.
On the other hand, It Has Been Done for tens or hundreds of thousands of database objects. I will caution you that at this moment (on Heroku) that PGBackups (and pg_dump) support is going to be an issue, and at this time we cannot support databases that simply overwhelm pg_dump.
Heroku's continuous archiving is so far able to deal with such database layouts, so the fork/follow features will still work.
There have been various approaches to row-level security, so I think that's what you want to investigate. There is this one somewhat crazy project called "Veil" that is *not* supported on Heroku, although I am intrigued by it. It's even more involved than an average extension, requiring a preload-hook at database startup.
All in all, I'd say your outcome with a row-level security is much more predictable, but that's not to say there are not nice properties of schema isolation.
You've really got me thinking of reconsidering my M/T strategy (which I first developed in 2008 back in Rails 1.8 days I think). I'm actually leaning towards a row-based method after thinking about your concerns.
Guy Naur also has a good discussion of different M/T strategies with Rails examples: http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html. He also confirmed that Postgres schema_path is CONNECTION-based; so it is feasible.
But, Postgres has nice updatable Views & Rules which make it possible to lock down all table row accesses and force a row tenant check (perhaps tie this in to a Rails scope as well?). I looked at Veil project, but I think it's overkill for what I need and implementation in a heroku environment looks prohibitive. I can force all tables to have a tenant_id column, treat tenant nil as universal (user, world, product tables), etc.
Row-based will be much easier to implement than schema (or it's table_name mangling equivalent).
Current data analysis and future projections
Below is a table of some data based on my 4 most active beta clients after a year of usage, and extrapolations for each of three possible growth scenerios: PAYS RENT, LIFESTYLE, VENTURE.
scenerio   tenants  users     table-A   table-B     tables  pg_dump
               n       n       rows/yr  rows/yr        n       hrs
ave            1       91      2,019      1,480        100    0.04
beta           4      363      8,076      5,922        400    0.2
pays rent    100    9,075    201,908    148,042     10,000    4
lifestyle   1000   90,750  2,019,083  1,480,417    100,000   42
venture    10000  907,500 20,190,833 14,804,167  1,000,000  417

Assumptions used for projections
I'm assuming that pg_dump time will be linear. I've included row counts for my two biggest growth tables. The app does automatic duty-roster scheduling for collaborative organizations (such as non-profits). Both Tables A & B show the approx ANNUAL row growth rate. All other tables are more or less dependent on the number of users, which tend to reach a steady state per subscribing client (the organization).
The tables column refers to number of tables (and primary id sequence tables, but not include indexes tables).
I rely on pg_dump for backups, refreshing the staging app, refreshing the local development app, etc. Having it grow to where it requires an hour or more in pg_dump time is not practical.
Summary of ROW-BASED Pros & Cons
  • simpler implementation
  • ability to use existing rails migration tools
  • use of Postgres Views/Rules & Rails 3 scope to enforce
  • faster pg_dump, typical use case optimization in DB parser/planner for SELECT QUERIES
  • no need to jerry-rig IDs
  • no monkey-patching of Rails A/R (maybe just the connection)
  • might break down at 20 to 50M records in a single table?
  • difficulty in partitioning?
I think row-based multi-tenancy is the more well-tested model simply because lots of software assumes there will be many millions of records in a table. Consider using CREATE FUNCTION with LANGUAGE SQL or LANGUAGE PGSQL (the procedural variant) instead to handle data modification, should you opt for database-side constructs. I think it's fine to do multi-tenancy at either the application or database level, the only advantage to the latter is perhaps making those multi-tenant-aware manipulations available to other applications, or to gain one more level of defense.
The main reason to partition a table is to be able to drop and scan entire partitions in one shot (physically organizing data to be adjacent for these reasons). There are definitely (many) databases out there with 50 million records in a single table that meet requirements well, but it basically depends on workload and how big (wide) the records are. For example, indexing a text column is many times more expensive than an integer one. If you are particularly worried about not painting yourself into a corner, you will have to experiment and write simulators. Remember that Heroku databases are charged per hour or even more granular than that, so if you have a simulation you like you can provision => run => deprovision right away.
I've finally completed the migration of my app to Rails 3.1.3, including the total re-write of the multi-tenanting. I chose the row-based approach that you recommended, and thought you'd like some metrics for future reference.
  • Migrating the existing database: I had to write a combination of script (18-steps), rake tasks, and small Ruby code (about 200 lines) to completely transform the database from schema-based to row-based. I wanted it to be automatic (was able to take the existing production DB and turn it into the new DB in under 8 minutes) and I wanted it to be pristine (ALL indexes, sequences, and tables to be as though created fresh). I chose to do the ID field (64-bit down to 32-bit) transformations locally so that I could use POSTGRES functions.
  • resulting DB is 50% the size of the schema-based.
  • pg_restore is lightening fast now
  • DB accesses are also significantly faster; POSTGRES is more efficient with 32-bit ID fields rather than 64-bit.
  • row-based Multi-tenanting, using Rails default_scope was much simpler, and totally non-intrusive. 
  • I made it into a gem called "milia", which is production-ready (altho my tests are a bit behind), and in-use now in my production app with Devise & DelayedJob.

The next article will be a tutorial for using my multi-tenanting gem, milia, in a RoR app on Heroku.

Multi-tenanting Ruby on Rails Applications on Heroku - Part II

Multi-tenanting methods
There are two primary, practical ways to ensure multi-tenanting for RoR applications on Postgres:
  • schema-based
  • row-based
This part of our discussion will briefly outline these methods; the next article will go into pros & cons for both methods (specifically in the context of a RoR app using Postgres on Heroku).
Less than ideal
Before continuing, it's worthwhile to point out that strapping multi-tenanting onto an application is less than ideal (IMHO). It would be better if the capability was built into the ActiveRecord base of Rails, or even better, built into the DBMS itself. To a large degree, application code should look as though it were a normal (ie, single tenanted) Rails app. The knowledge about multi-tenanting would ideally be missing from almost all the MVC structures.
The reason for this is to ensure greater security, similar to the difference between whitelisting and blacklisting. If the given assumption for the app is that everything is automatically constrained to the current tenant, then additions and changes to the code will likewise automatically be constrained. Meaning that future changes will not put the tenant's data at risk.
Our example in Part I, where we want to allow consultants to have different tenant clients, makes this difficult, because at some point in our code, we'll have to be able to possibly do un-constrained finds across multiple tenants belonging to a given user. It is better, however, if we minimize that need. We could do that, for example, if we let the user choose to which of her tenant clients she wishes to focus; after changing the tenant, then, all the rest of the app is constrained to the current tenant.
schema-based methods
In this method, we use the database meta-structure to segment the data. One approach is to append a suffix to each tablename to distinguish the tenant.
posts__w2, for example, might be the posts table for tenant 2.
Another is to use Postgres' SCHEMA capability to designate what is essentially a namespace within which all tables reside. By default, Postgres puts all data in the SCHEMA called public.
The effect of this to multiply the meta-structure.  Suppose your application requires 5 tenanted tables: Posts, Comments, Authors, Images, Profiles. Each table has a primary key (thus in Postgres-land, another table), and has two indexes (two more tables). That means 15 meta-structures (tables) per tenant. If you have 1000 tenants, then you'll have 15,000 tables.
row-based methods
Row-based methods ensure tenant segmented data on the basis of each row. So all tenants access the same table, but are restricted to only viewing (or even accessing internally) data from rows which belong to them.
In this method, the number of database meta-structures are constant no matter how many tenants there are. In the example above, there would still be only 15 meta-structures for 10,000 tenants.
our next article will discuss the pros & cons of both methods.
Glossary of terms
  • tenant -- an organizational account
  • user -- a member of an organization; thus exists within the context of a tenant
  • constrain -- limit access to a given tenant's data
  • unconstrain -- access data across tenants
  • universal table -- data which doesn't belong to any specific tenant and needs to be globally accessed: such as the tables for Tenants and Users.
  • tenanted table -- data which only belongs to a given tenant and must not be visible, whether intentionally or not (omission or comission), to anyone who is not one of the tenant's designated users.

Multi-tenanting Ruby on Rails Applications on Heroku - Part I


Multi-tenanting is a term used to describe the ability to virtually partition data within in the application database so that the data of different tenants (typically client accounts) cannot be accessed by users of other tenants of the application.
Heroku is an awesome Platform as a Service (PaaS) provider which originally started out specializing in hosting Ruby on Rails Applications using Postgres. I use Heroku extensively for the Ruby on Rails applications which I develop and support.
Let's take a look at the fundamental philosophy of a multi-tenanted application. Suppose, for example, that you have a SaaS application, running on Heroku, which offers a strategic planning service for organizations. Each organization would then be a tenant and there would be multiple users of that organization accessing that organization's strategic plan data. Obviously, each tenant would want to know that rigid safeguards were in place to prevent their organizational plans from being visible outside their organization, even accidentally.
Perhaps you might also want to foster an eco-system with consultants for your application. Then it might happen that a consultant would need access to multiple tenants.
In Rails-ese, we've described two models, Tenant and User, with the following associations:

class Tenant < ActiveRecord::Base
  has_and_belongs_to many :users 

class User < ActiveRecord::Base
  has_and_belongs_to many :tenants 

This series of articles will discuss different multi-tenanting methods (specifically in the context of Postgres on Heroku for RoR apps) and end with a tutorial for how to use milia, a gem to ensure multi-tenanting for RoR apps on Heroku.
There are four parts to this discussion:
  1. Introduction
  2. Multi-tenanting methods
  3. Schema-based vs Row-based Multi-tenanting
  4. Milia tutorial 

subclassing RefineryCMS user

In the refinerycms-stores engine I am building, I wanted to have a Customer class subclassed from User:
class Customer < ::Refinery::User
      has_many  :addresses, :class_name => ::Refinery::Addresses::Address 
      has_many  :orders, :class_name => ::Refinery::Orders::Order, :foreign_key => :order_customer_id 
      has_one   :billing_address, :class_name => ::Refinery::Addresses::Address,
         :conditions => { :is_billing => true, :order_id => nil }

      has_one   :shipping_address, :class_name => ::Refinery::Addresses::Address,
         :conditions => { :is_billing => false, :order_id => nil }
but there was a problem when I used current_refinery_user because the object was a User, and if I referenced current_refiner_user.billing_address, I would get a method_missing exception. Below is the solution I used to extend the added associations into the ::Refinery::User class.
Add to stores/lib/refinery/stores/engine.rb, the following section within the initializer block, and then everything works good.
      config.to_prepare do
        ::Refinery::User.class_eval do
      has_many  :addresses, :class_name => ::Refinery::Addresses::Address, :foreign_key => :customer_id 
      has_many  :orders, :class_name => ::Refinery::Orders::Order, :foreign_key => :order_customer_id
      has_one   :billing_address, :class_name => ::Refinery::Addresses::Address,
         :foreign_key => :customer_id,
         :conditions => { :is_billing => true, :order_id => nil 
      has_one   :shipping_address, :class_name => ::Refinery::Addresses::Address,
         :foreign_key => :customer_id,
         :conditions => { :is_billing => false, :order_id => nil }
        end  # extend user for customers
      end  # to prepare

Using RefineryCMS signin but changing after_sign_in_path_for

The reason for this is so that we can make use of Refinery/Devise user authentication & roles (incl nifty modalbox login views) but get the redirect after the signin/signup/signout to return back to the origination.
The sample here is shown from the refinerycms-stores engine I am developing. So in this example: stores is the engine namespace, and therefore the filepaths are presumed in the "stores" path (either within app/vendor/extensions, or via a gem).
create a file:  stores/lib/refinery/stores/authenticated_systems.rb
module Refinery
  module AuthenticatedSystem
    def after_sign_in_path_for(resource_or_scope)

    def after_sign_out_path_for(resource_or_scope)

    def after_update_path_for(resource)

    def after_sign_up_path_for(resource)

    def store_root
add a require in stores/lib/refinery/stores.rb to reference that file (it's the line following the autoload):
require 'refinerycms-core'

module Refinery
  autoload :StoresGenerator, 'generators/refinery/stores_generator'
    require 'refinery/stores/authenticated_system'

  module Stores
    require 'refinery/stores/engine'
add the route into stores/config/routes.rb (its the root :to ... line)
  namespace :stores do
   root :to => 'stores#index'
    resources :stores, :only => [:index, :show]  do
      collection do
        post :add_to_cart
        post :empty_cart
        post :checkout
restart your server

Tailoring a RefineryCMS 2.0 project - Part 2

in our last episode...

In Part 1 we prepared basic RefineryCMS, pushed it to production on Heroku, then added all the several key refinery engine extensions we'd like to have.
In Part 2, we'll be tailoring the layout and stylesheets. Steps after that include adding outside jQuery plugins.

prepping view overrides

Here's some suggestions to get you started; you may need more, depending on how you want to change styling. layouts/application.rb is to establish an overall Rails-type of layout for all pages. And you may want changes to the header (menu bar) and footer.
$ rake refinery:override view=layouts/application
$ rake refinery:override view=refinery/header
$ rake refinery:override view=refinery/footer

prepping assets for your app

You may want to review the Rails Asset Pipeline Guide for how and where to add your stylesheet, javascript, and image assets. I'll just make a couple of references here to get you started thinking.
  app/assets/javascripts/jquery.colorbox.js, jquery.quovolver.js
  app/assets/images/  ...add in any as needed
You'll want application.sass to provide your own styling. You can look at the html for a typical page to know which divs you wish to style. You may want to add additional jQuery plugins to improve your user interfaces. And of course, if you have any styling images, they'll need to be included as well. I like to replace the empty favicon.ico with my own; a nice little touch.

prepping any refinery extensions & overrides

You may want or need some code differences. I wanted to have certain page parts to be included within their own div id="sidebar_content". You'll need to reference these modules within the initialization sequence (shown below).

configuration & initialization

config.autoload_paths += %W(#{config.root}/lib)
config.autoload_paths += Dir["#{config.root}/lib/**/"]
  # add video tags to sanitize permitted tags
    config.action_view.sanitized_allowed_tags = 'table', 'tr', 'td', 'iframe'
    config.action_view.sanitized_allowed_attributes = 'id', 'class', 'style', 'src', 'width', 'height', 'frameborder'
    config.after_initialize do
        require  'gardenia_extensions'

        ::Refinery::Pages::ContentPresenter.send(:include, ::GardeniaExtensions::ContentPresenterExtension)
        ::Refinery::Pages::ContentPagePresenter.send(:include, ::GardeniaExtensions::ContentPagePresenterExtension)

config.new_page_parts = true

Comments? Questions? click here »

Tailoring a RefineryCMS 2.0 project - Part 1


I do technical support for a wonderful lady who designs websites for authors. About a year ago, we started collaborating together so that she could use RefineryCMS for her projects, instead of Radiant, Redmine, and other alternatives. We started when RefineryCMS was still at version 0.9, and have in fact migrated through several version changes, with v 1.0.9 being the last migration. Over the last year, we've put up about 14 websites.
I tailored RefineryCMS to her design preferences so that it would be easy for her to clone projects and build off of a stable design platform. I've added some settings so that she can easily choose between different choices: mastheads which stretch across the page, or which just fit in a centered container, for example.
The tailoring involves packaging together the following refinery engines: blog, news, inquiries, page-images, and mailchimp. We have standard sass-based stylesheets, which amply use SASS variables and mix-ins (macros), to make it easy to style. We have several jQuery plugins as well: for multi-level menus, lightbox display of images, banner rotator, quote revolver, etc.

This cheatsheet

This cheatsheet will record my work re-tailoring basic RefineryCMS 2.0 application to suit our needs.
It will cover my steps to integrate the various engines, stylesheets, jQuery plugins, etc and come up with a clone-able project template. By clone-able, I mean that she can just do a git clone <tailored_project> <new_project> and be off and styling in a matter of minutes.


First, build a basic RefineryCMS 2.0 app (see following blog post) and get it up on Heroku's cedar stack. Come back here when you've got that working.

Add some engines:

Repeatable Process (recipe will show using refinerycms-settings as the example; see Gist of output):
  • add engine requirement to Gemfile:
        gem 'refinerycms-settings', '~> 2.0.0'
  • bundle install
  • rails g refinery:settings
  • rake db:migrate
  • rake db:seed
  • foreman start     ... check it out locally first
  • git add .; git commit -am 'added refinerycms-settings'; git push origin
  • git push heroku
  • heroku run rake db:migrate
  • heroku run rake db:seed
  • check it out with your browser pointed to http:my_project.herokuapp.com




  -- you'll need to precompile the assets after rake db:seed and before push heroku
     $ RAILS_ENV=production bundle exec rake assets:precompile


  -- as of 05-Mar-12, your Gemfile line will need to be:
gem 'refinerycms-news', :git => 'git://github.com/resolve/refinerycms-news.git', :branch => "rails-3-1"


  -- as of 07-Mar-12, your Gemfile line will need to be:
gem 'refinerycms-page-images', :git => 'git://github.com/resolve/refinerycms-page-images.git', :branch => "rails-3-1"


  -- as of 07-Mar-12, this wouldn't initialize under RefineryCMS 2.0; will look into this later


  -- you'll need to precompile the assets after rake db:seed and before push heroku
     $ RAILS_ENV=production bundle exec rake assets:precompile

RefineryCMS 2.x app on Heroku Cedar Stack

This cheatsheet will show you the steps and changes I take to get a RefineryCMS 2.x / Rails 3.2.x app started on Heroku's awesome Cedar Stack. All new Heroku projects should be started on Cedar (not Bamboo) because this is where the burgeoning power of Heroku is taking place. Cedar gives you much more flexibility and power in specifying what your gems and execution stack.

How to

Follow RefineryCMS 2.0 download & install instructions.
Cedar requires you to use foreman (and thin) to run the project locally on your dev machine. That means the following addition to Gemfile (plus a couple of extras; make sure you have heroku):
   gem 'thin'
   gem 'fog'
 group :development  do
   gem  'foreman'
   gem  'erubis'
   gem  'heroku'
You'll need to create a Procfile in the project directory. Heroku requires this as well, so it needs to be part of the git repository. There's just one line in the file:
web: bundle exec rails server thin start -R config.ru -p $PORT -e $RACK_ENV/$RAILS_ENV
And I find it helpful to have a ./.foreman config file (in your project directory) to specify the port for the server (otherwise on my machine it defaults to port=5000). Again, just one line in the file:
port: 3000
To test your app locally, instead of $ rails s do the following:
$ foreman start
Once you've verified that everything is running correctly locally proceed to the next steps.
Make sure git has been set up and you've committed all your changes.
Now you're ready to create a new cedar stack on heroku. Make sure your heroku CLI is the latest version. In this example, I'm creating a heroku app called "kikapu".
$ heroku apps:create --stack cedar kikapu
Creating kikapu... done, stack is cedar
http://kikapu.herokuapp.com/ | git@heroku.com:kikapu.git
Git remote heroku added
$ git push heroku

Compiling assets locally for heroku

If you rely on heroku to precompile assets at time of slug generation, you'll run into a sticky error:
-----> Preparing Rails asset pipeline
       Running: rake assets:precompile
       ERROR: Unable to connect to memcached
       Precompiling assets failed, enabling runtime asset compilation
       Injecting rails31_enable_runtime_asset_compilation
It appears that Rails 3.2 now automatically does an ActiveRecord cache clear which forces a DB connection attempt during App initialization which is the cause of this failure (see the heroku article below for further details). Forturnately, there is a work-around which is to manually precompile assets; instructions given below.
(The following section has been taken directly from the hard-to-find heroku troubleshooting article: http://devcenter.heroku.com/articles/rails3x-asset-pipeline-cedar)
"If a public/assets/manifest.yml is detected in your app, Heroku will assume you are handling asset compilation yourself and will not attempt to compile your assets. To compile your assets locally, run the assets:precompile task locally on your app. Make sure to use the production environment so that the production version of your assets are generated."
You'll need to slightly change your Gemfile, and move
gem 'sass-rails',   '~> 3.2.3'
out from within the group :assets do .. end section, because we have to compile the assets in production environment and sass-rails won't be there. We need it because it contains the all-important image-url() method which we use within our stylesheet.sass to get the correct url to the image assets (see Rails Asset Pipeline Guide).
$ RAILS_ENV=production bundle exec rake assets:precompile
"A public/assets directory will be created. Inside this directory you’ll find a manifest.yml which includes the md5sums of the compiled assets. Adding public/assets to your git repository will make it available to Heroku.
$ git add public/assets
$ git commit -m "vendor compiled assets"
"Now when pushing, the output should show that your locally compiled assets were detected:"
-----> Preparing Rails asset pipeline
       Detected manifest.yml, assuming assets were compiled locally
Next on heroku, do the following:
$ heroku run rake db:create
$ heroku run rake db:migrate
$ heroku run rake db:seed
Then you can checkout your site at (fill in y our project name for project, below):

Enabling S3 resource storage

Since Heroku doesn't have local persistence (for images or other attached resources), you'll need to set up S3 on amazon and then add config variables to heroku for your app. Refinery core is already to use these new settings.
heroku config:add S3_BUCKET=my_bucket S3_KEY=AXXXXXXXXXXXXCA S3_SECRET=8ikjhgjkhgkjgkjggjkkgh062

Deprecation Warnings at Heroku

You'll get the following deprecation warning when running rake tasks on your app at heroku. Even if you don't have plugins, Heroku has installed a few vendor/plugins into your slug, which is the cause behind this deprecation.
DEPRECATION WARNING: You have Rails 2.3-style plugins in vendor/plugins! 
Support for these plugins will be removed in Rails 4.0. 
Move them out and bundle them in your Gemfile, or fold them in to your app 
as lib/myplugin/* and config/initializers/myplugin.rb. 
See the release notes for more on this: 

Generating multiple RefineryCMS engines in the same engine

My cheatsheets are -- in reality -- my medium-term memory banks. I'm a lazy programmer and don't like figuring out the same thing twice. so... This is an evolving/developing post .. but I use these cheatsheets too while I'm in development. Sometimes Brute Force involves lots of trial & error to get something right!
$ rails new kikapu -m http://www.refinerycms.com/t/edge
$ echo "rvm 1.9.3@kikapu" > kikapu/.rvmrc
$ vim kikapu/Gemfile

~ add:
  gem  'thin'
  gem  'rack'
~ ZZ (save & exit vim)

$ cd kikapu
$ bundle install
$ rails server
$ rails generate refinery:engine store name:string meta_keywords:string description:text
$ bundle install
$ rails g refinery:stores
$ rake db:migrate
$ rails server

$ rails g refinery:engine product store:references name:string code:string description:text date_available:datetime price:float size_width:float size_height:float size_depth:float weight:float tax_type:references digital_download:references main_pic:references inactive:boolean --engine stores

$ rails g refinery:engine order order_number:integer order_customer:references order_status:string order_notes:text shipping_type:references shipped_on:datetime product_total:float shipping_charges:float tax_charges:float cc_last4:string cc_card_type:string cc_expiry_month:integer cc_expiry_year:integer cc_token:string cc_purchased_on:datetime --engine stores

$ rails g refinery:engine line_item order:references product:references quantity:integer unit_price:float --engine stores --skip-frontend

$ rails g refinery:engine address customer:references order:references is_billing:boolean first_name last_name phone email address1 address2 city state zip country --engine stores --skip-frontend

$ rails g refinery:stores
$ rake db:migrate
$ rails server

During the course of generating engines within the existing engine, you'll need to respond "n" (no) to the following:
  Overwrite ~~/engines/stores/Rakefile? (enter "h" for help) [Ynaqdh] n
  Overwrite ~~/engines/stores/db/seeds.rb? (enter "h" for help) [Ynaqdh] n
  Overwrite ~~/engines/stores/lib/generators/refinery/stores_generator.rb? (enter "h" for help) [Ynaqdh] n
  Overwrite ~~/engines/stores/refinerycms-stores.gemspec? (enter "h" for help) [Ynaqdh] n

I don't want products to appear in dashboard tab; so:
$ vim lib/refinery/products/engine.rb
#         Refinery::Plugin.register do |plugin|
#           plugin.name = "products"
#           plugin.url = {
#             :controller => 'refinery/products/admin/products',
#             :action => 'index'
#           }
#           plugin.pathname = root

#           plugin.activity = {
#             :class_name => :'refinery/products/product',
#             :title => 'name'
#           }
#      end

add more engines
$ rails g refinery:engine order order_number:integer order_user:references order_status:string order_notes:text shipping_type:references  shipped_on:datetime product_cost:float shipping_cost:float tax_cost:float   --engine stores

When you add additional engines to an existing engine, you may or may not need to have additional lines added to db/seeds.rb. If you want to have a Page seeded for viewing by front-end users, then you'll need an entry in db/seeds.rb (see below). This needs to be manually added in since we had to skip that step during engine generation (because it would have overwritten db/seeds.rb).

RefineryCMS 2.x (edge) app generation

RefineryCMS 2.0.0 is still on the edge .. but if you want to try it out, here's how to do that.
This post was originally published 10-Feb-2012.


In your projectspace directory, you'll want it to have its own RVM gemset (mine is called "projects"). The refinerycms app is not rvm-friendly yet; it will do a bundle install at the end of setting everything up, and those gems will be installed in the projectspace gemset, not a new gemset for the new app (which is where they should be installed).

Gen new RefineryCMS 2.0.0 app

$ cd projectspace
$ gem install rails -v'3.2.1'
$ rails new <new app> -m http://www.refinerycms.com/t/edge
That's it. The -m option references an application template. You might be able to add additional refinerycms options on that line (I haven't tried it yet), for example selecting a different database than sqlite3, etc.
There's tons of output (see gist »). Good luck!

Jeweler: my gem creation cheatsheet

I use jeweler for gem creation.

setup steps

$ cd projectspace
$ gem install jeweler
$ jeweler --shoulda --testunit --summary "refinerycms shopping cart engine" --description "Complete engine for shopping cart to be used with a RefineryCMS project" --rdoc --user-name "Daudi Amani" --user-email "alphabeta@gmail.com" --github-username "alphabeta@gmail.com" --github-token <my API token> --git-remote  git@github.com:dsaronin/refinerycms-cart.git --create-repo refinerycms-cart
$ rvm gemset create cart
$ echo "rvm 1.9.3@cart" > refinerycms-cart/.rvmrc
$ cd refinerycms-cart/

RVM will ask you

Do you wish to trust this .rvmrc file? (/home/daudi/projectspace/refinerycms-cart/.rvmrc)
respond: yes
$ vim Gemfile
~ change gem "rcov" to:
gem 'simplecov', :require => false, :group => :test
~ ZZ   (to save & exit vim)
$ bundle install
$ vim Rakefile

~ remove the following section

require 'rcov/rcovtask'
Rcov::RcovTask.new do |test|
  test.libs << 'test'
  test.pattern = 'test/**/test_*.rb'
  test.verbose = true
  test.rcov_opts << '--exclude "gems/*"'

~ replace with

  require 'simplecov'
  SimpleCov.start  'rails'
~ ZZ   (to save & exit vim)
$ rake version:write

update jeweler-supplied .gitignore (see my blog post on my .gitignore)

gem build steps

You'll repeat this cycle (4 steps below), for each upgrade to your gem. You'll use bump:major, or bump:minor, or bump:patch (showing below) depending upon the degree to which the changes impact existing usage.
$ rake version:bump:patch
$ git add .; git commit -am "<some message>"
$ rake build
$ rake release


  • Jeweler tutorial this is a great tutorial (but ruby 1.8.7 based) with much detailed example;
    I patterned most of my cheatsheet off of this as a starting point 
  • Jeweler example
    back up information from the jeweler gem site 
  • Building a Rails 3 gem
    Handy information for Rails 3-specific stuff 

RVM - Ruby Version Manager

Always use Ruby Version Manager ».
Just do it, get used to it, make it work for you.
It is awesome and the only way to go … but it takes a different mindset and work flow to use correctly.
Everything I do is in the context of RVM, and so will the instructions in these cheatsheets. Even my projectspace has its own gemset. Be sure also to set RVM default gems for all new gemsets; it saves a lot of hassle.
Currently, I only use Ruby 1.9.3.

laziness is good, for programmers

Good programmers are essentially lazy; they hate re-creating anything that’s already working.

my .gitignore

this is always evolving ... but currently it looks like this:

# Rails
# Documentation
# Public Uploads
# Public Cache
# Vendor Cache
# Acts as Indexed
# Refinery Specific
# Mac
# Windows
# NetBeans
# Eclipse
# Redcar
# Rubinius
# Vim
# RubyMine
# Backup
# Capybara Bug
# sass
# other
# simplecov/rcov generated
# rdoc generated
# yard generated
# bundler
# jeweler generated

my development environment


  • ASUS P5E WS Pro motherboard
  • Intel Core2 Quad Q66000
  • 6MB DDR2 memory
  • GeForce 8800 GTS 320MB
  • 250 GB seagate
  • dual Samsung B2430 monitors.


  • Ubuntu 12.04
  • Google chrome


  • Ruby Version Manager
    - Ruby 1.9.3 as my ruby of choice
    - RVM manages separate gemsets in different projects
  • gvim is my IDE (way better, faster, troublefree than Aptana)
    - minibufexplorer
    - NERDTree
    - taglist
  • ack-grep (how did I ever live without it?)
  • bundler
  • jeweler
  • curl
  • factory girl, shoulda
  • foreman
  • thin

About this RailsCraft blog

Getting started

After 5 years of using Ruby on Rails, it's time to share what I've learned about making applications work, especially when using multifarious gems, plugins, engines, and random open source sorcery.
I have written 6 full-scale RoR applications, almost all of which run on Heroku's excellent platform. I'm technical support for about 15 RefineryCMS sites (RefineryCMS is a RoR open source CMS).
And today [08-Feb-2012], I just launched my newest SaaS offering, Majozi -- a duty-roster scheduling app for non-profits and small businesses.

Brute Force

Three years ago, as I was getting my groove going with RoR, I an engineering friend rather disparagingly accused me of brute force. In my earlier years of software engineering (complex realtime multitask embedded systems), I was rather proud of my elegant and efficient designs. His remark wounded me at the time.
Now, I wear the BruteForce badge proudly. It is not easy to pull together original design, open source gems, and a constantly changing Rails platform and make everything work in a professional application built against realworld time & budget constraints.
So, this blog will be sharing what I've learned to make it all just work. Brute force is where the elegant meets the dirty road of real applications and gets there on time.

A promise

I hope you'll never read these words in this blog: "it's really easy, just do..." Every time I read those words in a tutorial blog I cringe from all the times it wasn't easy, didn't work, the instructions were insufficient, or the example was so trivial as to be worthless.

Note: I'm importing all the archived blog posts from my tumblr blog; this one was originally published 08-Feb-2012.