There are many things to consider when integrating with Software as a Service (SaaS) solutions, some of which are easily overlooked. In many aspects, integrating with SaaS is similar to integrating with packaged products, but it is often more challenging due to the intricacies of integrating with a system hosted outside your network. In this blog post, I will go over some key planning considerations to be made in terms of data modeling, mapping, security, integration strategies and data cleansing among others.
Key characteristics of SaaS
"Software as a Service" solutions have been out there for several years now. Different vendors have different definitions and such systems were branded as SaaS not long ago. Let us start by establishing some key elements of Software as a Service (SaaS) that distinguish them from packaged software, web applications, cloud computing and traditional application hosting:
SaaS solutions are centric around specific business functionality, for example:
- They are highly configurable, not just from a user interface standpoint, but also extensible through API or custom code.
- The SaaS licensing model is pay-per-use or subscription based. This typically translates in lower up-front cost compared to packaged software.
- They are managed and hosted by a third party typically on a multi-tenant environment.
- SaaS applications are part of cloud computing. Cloud computing is a broader term that includes infrastructure as a service (e.g. Amazon EC2, Microsoft Windows Azure, rackspace.com, etc.) where SaaS refers to a specific software application and underlying data available through the cloud (e.g. Salesforce.com). .
Next are some key planning considerations to be made when integrating SaaS solutions.
Business process modeling
Implementing a SaaS solution gives the opportunity to modernize processes and provide users with a more integrated experience. Because of this, SaaS projects must start with business process modeling. This includes taking steps to identify and develop a high-level understanding of changes in business processes and business level interactions between systems that leverage SaaS and internal systems. This should be done with key stakeholders on the business side to understand longer-term vision for business integration potentials.
Data modeling and mapping
SaaS data storage is -in almost all cases- hosted by the SaaS vendor. The underlying database is often hidden and not directly available for integration purposes. This means that integration must be done through API calls or data bulk loading. Data bulk loading is usually a good option for initial loads or batch synchronizations, but a true integration can only be achieved through API calls. Such integration requires establishing a mapping between the SaaS data model and the existing system's data model. This requires understanding the entity relationships and being able to create a data mapping between two data models. In some cases, there will not be a one-to-one mapping between the two models. This is where the customizations of the SaaS solution capabilities come into place. Some SaaS solutions allow extending their data model by creating custom objects and custom relationships. In addition to establishing a data mapping between the models, it is also necessary to be able to correlate primary keys between the two systems. There are several options for accomplishing this. One option involves storing external system IDs within the SaaS database. Another option is creating a cross reference mapping outside the SaaS system.
Even if the SaaS solution becomes the new system-of-record, it may not be the owner of all the entities. Based on the business requirements there could be a subset of the data that needs to be owned or maintained from another system. The easiest solution is when there is only one system-of-record for each data set. However it is often the case that data updates may originate from multiple sources. It is critical to identify such requirements early on, as they can increase the scope of integration, making it more complex.
There are many factors that can impact the integration strategy that merit a separate blog to go in to more detail. The integration approach will depend on business requirements and may also be influenced by constraints of the SaaS product and legacy systems. Here is a summary of decisions that need to be made to determine the integration approach:
- Real-time vs. batch integration. Integration can be done real-time through small transactions or through batch synchronization. Batch synchronization is a feasible option only when a delay is acceptable and there are no expected concurrent updates (i.e. integration is one-way). A direct and real-time integration in most cases will be the ideal solution as it provides the most up to date information to the user while minimizing the potential for duplicate or inconsistent data across systems.
- Integration layer. An integration layer is often required to provide a real-time direct integration with SaaS. This integration layer is typically implemented through the use of integration products, such as an ESB (Enterprise Service Bus). The ESB will provide capabilities to integrate real time, most likely through a Web Service API, and communicate with internal systems using different transports (e.g. message queues) as well as transforming between the different applications' message formats. The ESB can also hide or abstract details about the SaaS API from the rest of the integrated systems.
- Push vs. pull. Another factor to account for is which system initiates the calls. A live integration will require the SaaS product to initiate calls to internal systems through the firewall. The SaaS application may not necessarily provide this capability and even if it does, it is often desirable to maintain this logic in house as it may require complex transformation and workflow logic. Additionally there may be security implications for allowing an external system to initiate API calls through the firewall.
- Direction (one way vs. bi-directional). As mentioned above, depending on data ownership it may be necessary to support a bi-directional integration.
- Frequency. Depending on the frequency requirements for synchronizing data, a full integration layer may not be required. This may be the case if the synchronization will only need to occur as a one-time only, or via infrequent bulk loads. In such scenarios, using data bulk load capabilities built-in to the SaaS product or a simple ETL (Extract, Transform, Load) tool may be sufficient to satisfy the integration needs.
- Partial vs. full entity synchronization. In some cases, only certain fields or a subset of entities will need to be synchronized between SaaS and existing systems. This may significantly impact the scope and complexity of the integration solution. The fields and entities that need to be synchronized should be identified early on, along with the data mapping.
In terms of security, as in any application there are two main areas that must be accounted for:
- Authentication. The SaaS provider will provide their own built-in authentication capabilities. However in many cases single sign-on will be required. Because of this, it is critical that the SaaS vendor provides single sign-on or an API for delegated authentication. Single sign-on is often implemented using SAML (Security Assertion Markup Language).
- Authorization. Authorization must be provided at a role based level. In some cases record level authorization may be provided by the SaaS vendor. It is also necessary to examine and understand access policies that span systems. For instance, data that is integrated from an internal system to SaaS will be secured with SaaS software authorization rules and unintentionally expose internal data to users on the SaaS system.
Two authorization protocols that are emerging as standards are OAuth and AuthSub. Both standards are similar in that they allow users to grant restricted access to partial data or resources to a third party application without revealing the user credentials to the third party application. The user authorizes access from this external application once and on subsequent access the external website acts on behalf of the user by submitting a special token. Such standards are supported by popular websites such as Google's Web Applications, Netflix, Twitter, among many others.
Quality of data (data cleansing)
A key area, often overlooked is the quality of existing data. In many cases, XML will be the common format for exchanging data. XML data types may be less relaxed than the one in existing data models. For instance, dates represented in XML Schemas need a specific format, in which year, month and day are required, while the legacy systems may only store year and month. Additionally the SaaS vendor may impose certain constraints such as required fields that may not be available on the existing data. All of these constraints need to be identified early on, so that the data can be improved prior to the integration.