Audit9 Blog

Salesforce Data Integration

A key challenge to the successful implementation of enterprise software, such as CRM, is data integration. In the simplest case this may involve periodic loading of lead lists; at the extreme end of the complexity scale, near-real-time transactional data flows are required with multi-lateral orchestration. In all cases it is imperative that the integration solution address the current state requirements in consideration of future evolution. Integration technology is expensive in product terms but also in respect to implementation costs and ongoing future maintenance. A flawed or incomplete integration architecture, inappropriate technology selection, inadequate analysis of the integration requirements or insufficient consideration to service management can all result in a failed integration state. This outcome will typically result in the failure, in perception or otherwise, of the overarching programme of work.

To mitigate this risk, data integration should be subject to a full analysis and design process that fully defines the logical data flows as constituent components of business processes that (almost incidentally) span multiple enterprise systems. A full specification of such coordinated business processes, or integration use cases, must always precede any consideration to the physical form of integration. As a minimum requirement each use case must define characteristics such as volume, frequency, latency of update and security concerns such as visibility rules. A common mistake is to view data integration as a lift-and-shift exercise analysed through field-to-field data mapping exclusively.

A related concern in this context is data migration; the establishment of the optimal day-one data state. In many cases data migration and integration are viewed as separate concerns and developed in isolation. This approach ignores their inherent coupling and typically delivers a disjointed state that requires re-factoring to remove mismatches or friction. A more efficient approach is to blend the day-one state and run-state into a combined analysis and design process that produces a complete and cohesive set of data use cases.

Once the logical integration architecture requirements are sufficiently well defined in terms of integration use cases, the next consideration is the optimal style of integration to apply. The term style refers to the physical manifestation of the integration flows and should be defined initially on a per-flow basis. The combined view provides the primary input into the definition of the physical integration architecture and related technology solution options.

The remainder of this post provides a high-level view of the options to consider in respect to integration styles.

  • Virtual Data
  • With the virtual data style of integration, data is not transited from source to target system en-masse; instead the data remains at source and is queried by the target system on demand using standards such as OData. With this style, a real-time view is provided without the overhead of a complex data synchronisation architecture. A virtual data approach is however contingent on a performant connectivity layer and in some cases has limitations in regard to data modifications, volumes, performance and ability to re-establish data relationships in the target system.

    Salesforce1 Lightning Connect is an emerging solution option in the Salesforce context for the virtual data integration style. Lightning Connect provides support for External Objects that behave in a comparable way to Custom Objects, but proxy queries out to external data sources via the OData protocol, or via custom adapters developed with the Apex Connector Framework. The current release supports read-only data views and the ability to connect directly between Salesforce orgs. This feature set will be complemented by a write capability and the ability to quickly persist external data to a Custom Object in future releases. The main data integration vendors in the Salesforce space (Mulesoft, Informatica and Jitterbit) provide OData capabilities that are fully compatible with Lightning Connect, enabling on-premise or cloud-based enterprise systems to be exposed to the Salesforce platform with minimal effort.

    In the context of Salesforce integration, given the standards-based approach and technology support in place, Lightning Connect provides a viable, secure strategy for many data integration use cases and should be ruled-out before consideration is given to the physical transfer of data.

  • ETL – Middleware
  • The extract, transform and load integration style exemplifies the process of data being batched and transited from source to target system. There are a multitude of vendor options in regard to middleware solutions for the ETL style, with variation typically in relation to the transformation capability and job management functionality. ETL middleware tools should be evaluated carefully in two primary areas; firstly the pricing model and secondly the ability to deliver end-to-end integration solutions without reverting to code or additional tools. In relation to pricing, most ETL middleware solutions are sold on a per-endpoint basis, which makes sense, however this can become a constraint where the incremental addition of a new end-point is either cost-prohibitive individually or on a cumulative basis requires licensing cost for the next tier of the product. The definition of what precisely constitutes an endpoint can also be ambiguous. In relation to capability fit, certain data transformation requirements can be difficult to achieve with an ETL tool and require a custom adapter/connector etc. to be developed, or a costly additional design tool to be purchased, both of which the vendor can help with but at an additional cost.

    A key aspect to consider for ETL middleware solutions is their ability to expose enterprise data securely to Salesforce via proprietary agents that are deployed inside the firewall and facilitate bi-directionally invoked data connectivity on an outbound connection basis. This approach avoids the requirement to provide infrastructure to secure inbound connections. An additional advantage to cloud-based middleware solutions is the ongoing maintenance and upgrades applied to the service, this ensures the integration technology is aligned with the evolution of the Salesforce platform.

    It should also be noted that ETL vendors typically offer a free-tier product which although constrained, as one would expect, can support a surprising number of use cases and should always be evaluated particularly in the context of one-off data migrations.

  • ETL – Technical
  • In addition to ETL middleware solutions, which provide an efficient delivery model and the possibility to maintain and support the integration solution with non-technical resource, there is also the option to implement a technical approach to ETL. In such an architecture low-level technical components are used to assemble the solution within the context of an integration platform such as Sql Server Integration Services (SSIS). This approach can be a good fit where integration requirements are complex or involve proprietary systems/formats, the resourcing model has a technical orientation or in cases where the technology is currently in use within the enterprise and a support structure exists. A technical ETL approach can also be significantly less expensive in product (or software terms). Vendors such as CData/RSSBus, Progress DataDirect and CozyRoc provide standards compliant drivers (ODBC, OData) or application adapters that can be employed to build out technical integration solutions.

    The number of required endpoints (over the service life) can be a key qualifying factor for a technical ETL solution; the incremental endpoint cost incurred with the middleware approach is avoided which can remove an otherwise artificial constraint from the future extensibility of the architecture.

  • Message-oriented / SOA
  • In cases where integration flows are transactional in type and require orchestration or near-real-time response a message-oriented integration style is the optimal approach. A common problem area for data integration is where ETL solutions are applied inappropriately to message-oriented integration use cases. ETL tools can be configured to deliver low latency synchronisation however this can be challenging to achieve in a robust and scalable architecture. An integration solution designed to specifically address the nuances of a near-real-time transactional interface with multilateral, message orchestration should be preferred. The enterprise service bus (ESB) is the common exemplar of such a solution; in the Salesforce context vendors such as Mulesoft, NeuronESB and Tibco should be considered in addition to open source options such as JBoss Fuse and Apache ServiceMix.

    The implementation of an ESB is a significant investment; the tools can be expensive and require specialist expertise and the implementation time frame will be protracted. As such the ESB approach should always be 100% substantiated by the integration requirements. An over-specified integration architecture is a common mistake to avoided.

  • Salesforce Org-to-Org
  • Data integration between Salesforce instances (orgs) is an increasingly common scenario (due in part to a rise in the implementation of multi-org architectures), in this context there are two native solution options to consider. The Salesforce Adapter for Lightning Connect (Summer ’15) enables seamless data sharing across orgs via the Lightning Connect framework. The second option is Salesforce-to-Salesforce an on-platform, proprietary integration solution initially intended to support partner record sharing scenarios such as Lead/Opportunity distribution but is commonly used for general purpose record synchronisation across orgs.

    The native solution options for cross-org data sharing should always be ruled out before 3rd party solutions are considered.

  • Point-to-Point
  • The final integration style is point-to-point (or just point) solutions where code is written to consume the Salesforce APIs (SOAP, Rest or Bulk), or scripts are developed to automate the Apex Data Loader tool via its native batch mode command-line-interface.

    Programmatic point solutions can be advantageous for transactional use cases where the requirements are limited in terms of flow count and complexity and where the cost of an ESB is unsubstantiated or unrealistic. Scripted solutions offer utility in terms of automating administrative tasks and low-volume non-critical integration tasks. Point solutions should be applied judiciously and avoided wherever possible; the approach isn’t maintainable or scalable.


About the Author

Mark Cane / Salesforce Certified Technical Architect