29 May 2013

Multi-tenanting Ruby on Rails Applications on Heroku - Part II

Multi-tenanting methods
There are two primary, practical ways to ensure multi-tenanting for RoR applications on Postgres:
  • schema-based
  • row-based
This part of our discussion will briefly outline these methods; the next article will go into pros & cons for both methods (specifically in the context of a RoR app using Postgres on Heroku).
Less than ideal
Before continuing, it's worthwhile to point out that strapping multi-tenanting onto an application is less than ideal (IMHO). It would be better if the capability was built into the ActiveRecord base of Rails, or even better, built into the DBMS itself. To a large degree, application code should look as though it were a normal (ie, single tenanted) Rails app. The knowledge about multi-tenanting would ideally be missing from almost all the MVC structures.
The reason for this is to ensure greater security, similar to the difference between whitelisting and blacklisting. If the given assumption for the app is that everything is automatically constrained to the current tenant, then additions and changes to the code will likewise automatically be constrained. Meaning that future changes will not put the tenant's data at risk.
Our example in Part I, where we want to allow consultants to have different tenant clients, makes this difficult, because at some point in our code, we'll have to be able to possibly do un-constrained finds across multiple tenants belonging to a given user. It is better, however, if we minimize that need. We could do that, for example, if we let the user choose to which of her tenant clients she wishes to focus; after changing the tenant, then, all the rest of the app is constrained to the current tenant.
schema-based methods
In this method, we use the database meta-structure to segment the data. One approach is to append a suffix to each tablename to distinguish the tenant.
posts__w2, for example, might be the posts table for tenant 2.
Another is to use Postgres' SCHEMA capability to designate what is essentially a namespace within which all tables reside. By default, Postgres puts all data in the SCHEMA called public.
The effect of this to multiply the meta-structure.  Suppose your application requires 5 tenanted tables: Posts, Comments, Authors, Images, Profiles. Each table has a primary key (thus in Postgres-land, another table), and has two indexes (two more tables). That means 15 meta-structures (tables) per tenant. If you have 1000 tenants, then you'll have 15,000 tables.
row-based methods
Row-based methods ensure tenant segmented data on the basis of each row. So all tenants access the same table, but are restricted to only viewing (or even accessing internally) data from rows which belong to them.
In this method, the number of database meta-structures are constant no matter how many tenants there are. In the example above, there would still be only 15 meta-structures for 10,000 tenants.
our next article will discuss the pros & cons of both methods.
Glossary of terms
  • tenant -- an organizational account
  • user -- a member of an organization; thus exists within the context of a tenant
  • constrain -- limit access to a given tenant's data
  • unconstrain -- access data across tenants
  • universal table -- data which doesn't belong to any specific tenant and needs to be globally accessed: such as the tables for Tenants and Users.
  • tenanted table -- data which only belongs to a given tenant and must not be visible, whether intentionally or not (omission or comission), to anyone who is not one of the tenant's designated users.