Multi tenancy Implementation in GOG

Contributed by shafeeq on 16 Jan 2013

Multi tenancy Implementation in GOG

Content

1.1 What if Multi tenancy

1.2 Existing Multi tenancy plug-in and problems

1.3 Implementation Multi tenancy in GOG.

1.1 What is Multi tenancy.

Introduction

Trust, or the lack thereof, is the number one factor blocking the adoption of software as a service (SaaS). A case could be made that data is the most important asset of any business—data about products, customers, employees, suppliers, and more. And data, of course, is at the heart of SaaS. SaaS applications provide customers with centralized, network-based access to data with less overhead than is possible when using a locally-installed application. But in order to take advantage of the benefits of SaaS, an organization must surrender a level of control over its own data, trusting the SaaS vendor to keep it safe and away from prying eyes.

To earn this trust, one of the highest priorities for a prospective SaaS architect is creating a SaaS data architecture that is both robust and secure enough to satisfy tenants or clients who are concerned about surrendering control of vital business data to a third party, while also being efficient and cost-effective to administer and maintain.

This is the second article in our series about designing multi-tenant applications. The first article, Architecture Strategies for Catching the Long Tail, introduced the SaaS model at a high level and discussed its challenges and benefits. It is available on MSDN. Other articles in the series will focus on topics such as workflow and user interface design, overall security, and others.

In this article, we’ll look at the continuum between isolated data and shared data, and identify three distinct approaches for creating data architectures that fall at different places along the continuum. Next, we’ll explore some of the technical and business factors to consider when deciding which approach to use. Finally, we’ll present design patterns for ensuring security, creating an extensible data model, and scaling the data infrastructure.

Three Approaches to Managing Multi-Tenant Data

The distinction between shared data and isolated data isn’t binary. Instead, it’s more of a continuum, with many variations that are possible between the two extremes.

Data architecture is an area in which the optimal degree of isolation for a SaaS application can vary significantly depending on technical and business considerations. Experienced data architects are used to considering a broad spectrum of choices when designing an architecture to meet a specific set of challenges, and SaaS is certainly no exception. We shall examine three broad approaches, each of which lies at a different location in the continuum between isolation and sharing.

Separate Databases

Storing tenant data in separate databases is the simplest approach to data isolation.

Figure 1. This approach uses a different database for each tenant

Computing resources and application code are generally shared between all the tenants on a server, but each tenant has its own set of data that remains logically isolated from data that belongs to all other tenants. Metadata associates each database with the correct tenant, and database security prevents any tenant from accidentally or maliciously accessing other tenants’ data.

Giving each tenant its own database makes it easy to extend the application’s data model (discussed later) to meet tenants’ individual needs, and restoring a tenant’s data from backups in the event of a failure is a relatively simple procedure. Unfortunately, this approach tends to lead to higher costs for maintaining equipment and backing up tenant data. Hardware costs are also higher than they are under alternative approaches, as the number of tenants that can be housed on a given database server is limited by the number of databases that the server can support. (Using autoclose to unload databases from memory when there are no active connections can make an application more scalable by increasing the number of databases each server can support.)

Separating tenant data into individual databases is the “premium” approach, and the relatively high hardware and maintenance requirements and costs make it appropriate for customers that are willing to pay extra for added security and customizability. For example, customers in fields such as banking or medical records management often have very strong data isolation requirements, and may not even consider an application that does not supply each tenant with its own individual database.

Shared Database, Separate Schemas

Another approach involves housing multiple tenants in the same database, with each tenant having its own set of tables that are grouped into a schema created specifically for the tenant.

Figure 2. In this approach each tenant has its own separate set of tables in a common database

When a customer first subscribes to the service, the provisioning subsystem creates a discrete set of tables for the tenant and associates it with the tenant’s own schema. You can use the SQL CREATE command to create a schema and authorize a user account to access it. For example, in Microsoft SQL Server 2005:

CREATE SCHEMA ContosoSchema AUTHORIZATION Contoso

The application can then create and access tables within the tenant’s schema using the SchemaName.TableName convention:

CREATE TABLE ContosoSchema.Resumes (EmployeeID int identity primary key,

   Resume nvarchar(MAX))

After the schema is created, it is set as the default schema for the tenant account:

ALTER USER Contoso WITH DEFAULT_SCHEMA = ContosoSchema

A tenant account can access tables within its default schema by specifying just the table name, instead of using the SchemaName.TableName convention. This way, a single set of SQL statements can be created for all tenants, which each tenant can use to access its own data:

SELECT * FROM Resumes

Like the isolated approach, the separate-schema approach is relatively easy to implement, and tenants can extend the data model as easily as with the separate-database approach. (Tables are created from a standard default set, but once they are created they no longer need to conform to the default set, and tenants may add or modify columns and even tables as desired.) This approach offers a moderate degree of logical data isolation for security-conscious tenants, though not as much as a completely isolated system would, and can support a larger number of tenants per database server.

A significant drawback of the separate-schema approach is that tenant data is harder to restore in the event of a failure. If each tenant has its own database, restoring a single tenant’s data means simply restoring the database from the most recent backup. With a separate-schema application, restoring the entire database would mean overwriting the data of every tenant on the same database with backup data, regardless of whether each one has experienced any loss or not. Therefore, to restore a single customer’s data, the database administrator may have to restore the database to a temporary server, and then import the customer’s tables into the production server—a complicated and potentially time-consuming task.

The separate schema approach is appropriate for applications that use a relatively small number of database tables, on the order of about 100 tables per tenant or fewer. This approach can typically accommodate more tenants per server than the separate-database approach can, so you can offer the application at a lower cost, as long as your customers will accept having their data co-located with that of other tenants.

Shared Database, Shared Schema

A third approach involves using the same database and the same set of tables to host multiple tenants’ data. A given table can include records from multiple tenants stored in any order; a Tenant ID column associates every record with the appropriate tenant.

Figure 3. In this approach, all tenants share the same set of tables, and a Tenant ID associates each tenant with the rows that it owns

Of the three approaches explained here, the shared schema approach has the lowest hardware and backup costs, because it allows you to serve the largest number of tenants per database server. However, because multiple tenants share the same database tables, this approach may incur additional development effort in the area of security, to ensure that tenants can never access other tenants’ data, even in the event of unexpected bugs or attacks.

The procedure for restoring data for a tenant is similar to that for the shared-schema approach, with the additional complication that individual rows in the production database must be deleted and then reinserted from the temporary database. If there are a very large number of rows in the affected tables, this can cause performance to suffer noticeably for all the tenants that the database serves.

The shared-schema approach is appropriate when it is important that the application be capable of serving a large number of tenants with a small number of servers, and prospective customers are willing to surrender data isolation in exchange for the lower costs that this approach makes possible.

Choosing an Approach

Each of the three approaches described above offers its own set of benefits and tradeoffs that make it an appropriate model to follow in some cases and not in others, as determined by a number of business and technical considerations. Some of these considerations are listed below.

Economic Considerations

Applications optimized for a shared approach tend to require a larger development effort than applications designed using a more isolated approach (because of the relative complexity of developing a shared architecture), resulting in higher initial costs. Because they can support more tenants per server, however, their ongoing operational costs tend to be lower.

Figure 4. Cost over time for a hypothetical pair of SaaS applications; one uses a more isolated approach, while the other uses a more shared approach

Your development effort can be constrained by business and economic factors, which can influence your choice of approach. The shared schema approach can end up saving you money over the long run, but it does require a larger initial development effort before it can start producing revenue. If you are unable to fund a development effort of the size necessary to build a shared schema application, or if you need to bring your application to market more quickly than a large-scale development effort would allow, you may have to consider a more isolated approach.

Security Considerations

As your application will store sensitive tenant data, prospective customers will have high expectations about security, and your service level agreements (SLAs) will need to provide strong data safety guarantees. A common misconception holds that only physical isolation can provide an appropriate level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use of more sophisticated design patterns.

Tenant Considerations

The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach.

How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.
How much storage space do you expect the average tenant’s data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)
How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.
Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.

Figure 5. Tenant-related factors and how they affect “isolated versus shared” data architecture decisions

Regulatory Considerations

Companies, organizations, and governments are often subject to regulatory law that can affect their security and record storage needs. Investigate the regulatory environments that your prospective customers occupy in the markets in which you expect to operate, and determine whether they present any considerations that will affect your decision.

Skill Set Considerations

Designing single-instance, multi-tenant architecture is still a very new skill, so subject matter expertise can be hard to come by. If your architects and support staff do not have a great deal of experience building SaaS applications, they will need to acquire the necessary knowledge, or you will have to hire people that already have it. In some cases, a more isolated approach may allow your staff to leverage more of its existing knowledge of traditional software development than a more shared approach would.

Realizing Multi-Tenant Data Architecture

The remainder of this article details a number of patterns that can help you plan and build your SaaS application. As we discussed in our introductory article, a well-designed SaaS application is distinguished by three qualities: scalability, configurability, and multi-tenant efficiency. The table below lists the patterns appropriate for each of the three approaches, divided into sections representing these three qualities.

Optimizing for multi-tenant efficiency in a shared environment must not compromise the level of security safeguarding data access. The security patterns listed below demonstrate how you can design an application with “virtual isolation” through mechanisms such as permissions, SQL views, and encryption.

Configurability allows SaaS tenants to alter the way the application appears and behaves without requiring a separate application instance for each individual tenant. The extensibility patterns describe possible ways you can implement a data model that tenants can extend and configure individually to meet their needs.

The approach you choose for your SaaS application’s data architecture will affect the options available to you for scaling it to accommodate more tenants or heavier usage. The scalability patterns address the different challenges posed by scaling shared databases and dedicated databases.

1.2 Existing Multi tenancy plug-in and problems

There is existing plug-in for implementing multi tenancy in GOG http://grails.org/plugin/multi-tenant-core

But this is implemented for older version of Grails 1.3.0 > * there are dependent plug-ins which are also out dated but it is not working with current Grails version 2.1.1. Hence I re-implemented the multi tenancy for Grails 2.1.1 > *. You can implement it by following bellow steps.

1.3 Implementation Multi tenancy in GOG

I am implementing Shared Database, Separate Schemas approach form multi tenancy implementation.

Following are the Steps needs to be implemented.

Create one Master DB Schema which holds all tenants in the system. Here in this demo let’s say master_schema. Create one or more schemas, for example

1. Tenant_schema1,

2. Tenant_schema2,

3. Tenant_schema3

Load all DB schemas into application context.
Create following Groovy classes for Loading Schemas and SwitchableDataSource for switching DataSource at run time.
1. Environmnet.groovy (This class loads master and tenant schemas)
2. EnvironmentHolder.Groovy (This is ThreadLocal class to hold current DataSource for a transaction/request)
3. SwitchableDataSource.groovy (Using this class at run time, DataSource can be changed to requesting tenant.)

Create following java classes for dynamically creating Spring Beans at run time.

At the Boot Strap Grails loads Default DataSource Provided in DataSource.Groovy or in application.properties. After Boot Strap loads default DataSource into Grails Context basically Application Context, we need to create new datasource for each tenant in the application, in our case we are loading three schemas tenant_shcema1, tenant_shema2 and tenant_shema3 in Environment.groovy and storing in a map. With the help of following java classes we can dynamically load new datasource. This activity takes place at application load time only. Once Datasources are loaded into Application context then only need to switch datasources based on current tenant and store in ThreanLocal Object, in our case it is EnvironmentHolder.groovy. I am trying put as much possible information to explain how DataSources are loaded in to Application Context. In Spring framework datasource is a Singleton Spring bean.

AppContext.java
ApplicationContextProvider.java.

Overriding Hibernate cache provider class:

Hibernate uses cache for each datasource created, here it required to create one cache for each tenant datasource it loads into application context. For this we need to create our own cache factory for creating new cache for each new tenant datasource created. For this create following two classes.

MultiTenantEhCache.groovy (In this class it creates new Hibernate cache with unique tenant id.)
MultiTenantEhCacheProvider.groovy (This class we need to give in Datasource.groovy file)

hibernate {

cache.use_second_level_cache = true

cache.use_query_cache = false

cache.provider_class = ‘grails.plugin.multitenant.ehcache.cache.MultiTenantEhCacheProvider’

}

In Controller following code needs to be used for changing Tenant

public boolean changeDataSorce(String dsToConnect,PwBusiness businessInstance)

{ def env = Environment.list().find {it.dbname == dsToConnect}

//Tenant DS is exist in Application context

if (env) {

if(setEnvironment(dsToConnect)) return true

else return false

// Tenant DS is not exist in Application context adding it

} else {

def envCustomers = new Environment()

envCustomers.addTenant(businessInstance)

envCustomers.addTenatDataSource(businessInstance)

if(setEnvironment(dsToConnect)) return true

else return false

}

private boolean setEnvironment(String dsToConnect){

def env = Environment.list().find {it.dbname == dsToConnect}

//test connection

def oldEnv = EnvironmentHolder.getEnvironment()

EnvironmentHolder.setEnvironment env

def ds = getDataSourceForEnv()

log.info “DS in login controller setEnv :> “+ds

try {

def con = ds.getConnection()

session.environment = env

log.info ‘Environment change complete. ‘+dsToConnect

con.close()

} catch (e) {

EnvironmentHolder.setEnvironment oldEnv

log.error ‘Unable to connect to database: ‘+dsToConnect+ e.message

return false

}

return true }

private def getDataSourceForEnv() {

ApplicationContext ctx = AppContext.getApplicationContext();

return ctx.getBean(“dataSource”);

}

For rest of the above source files contact me at [email protected].

Visit us at Neevtech.com to know more about our offerings.