Until Apilation, deploying advanced analytics that would scale to large enterprise data loads has been a long, complex, and costly undertaking. The problem is that to implement analytics within an enterprise requires a lot of moving parts to be successfully integrated – starting with the data itself.
In large companies, the most critical data is not held in a single place it is instead held within many separate applications – each with their own database structure or schema. Using traditional relational databases, all that disparate data needs to be extracted, transformed, and loaded (ETL) into a single location so that the analytics applications can get to it.
The problem was always the "Transform" part of ETL. Since the data is in different formats it needs to be normalized, cleaned, and audited for every different system then linked by file transfer, EDI, API, or other mechanisms. Establishing Master Data catalogs (Master Data Management) and Master Data repositories (i.e. data lakes) was tried. It didn't always work. Getting and keeping relational synchronized was too hard if there were too many systems. And that was before you even tried to do Analytics.
From there you need an Analytics warehouse and processing engine typically, either open-source Hadoop-based (for example Cloudera or Horton Works) or a Massive Parallel Processing (MPP) platform (such as Amazon Redshift). The Hadoop variations are less costly but lower in performance while MPP-based systems have higher performance but at much higher costs. Analytics modeling tools such as R, Python, SAS, and Visualization tools such as Cognos, PowerBI, Tableau, etc. need to be brought in next.
Finally, if it all went well, you could start to look at developing the individual use cases that would hopefully justify efforts. All that complexity in the ecosystem added up – typical enterprise Analytics initiatives run into tens of millions of dollars and several years in the making. Most companies have simply not been up to the task and end up extremely disappointed at the results. The complex Analytics ecosystem is shown below:
While the technical challenges may seem daunting, an arguably bigger reason that most companies struggle to implement Analytics is simply they don’t have enough skilled resources. According to Gartner, by 2020 there will be a shortage of 100,000 unfilled data scientist positions in the US alone. McKinsey pegs its estimate closer to 200,000 adding that demand will outstrip supply by more than 60%.
A few years ago, Accenture estimated that more than 80% of the new data scientist roles would go unfilled. By all accounts, the shortage of data scientists is a long-term problem with no easy answers. Unfortunately, finding enough data scientists is only part of the issue.
With all the complexity involved in the Analytics ecosystem, it also takes a small army of solutions integrators, database engineers, data architects, UI developers, consultants and the like to get the platform fully implemented. Finally, after the technology has been implemented by the technologists and the models have been written by the data scientists, business managers and analysts must interpret the insights being provided and translate them into actions that can be implemented on the ground.
According to McKinsey, the shortage of such business resources with the appropriate know-how is estimated to be about 1.5M jobs by 2018. With all these factors conspiring to make Big Data Analytics an extremely difficult problem to solve, is it at all surprising that only 4% of companies have managed to get it right?
The APIlation Advantage
To solve the data integration problem we pioneered a patent-pending object-oriented NoSQL integration technique that enables any file format or API to be dynamically mapped through the use of Advanced Machine Learning / Artificial Intelligence (AI) algorithms.
NoSQL technologies are uniquely suited for providing the interoperability required of applications because their document-based architectures and flexible data schemas accept both structured and unstructured data without requiring complex data normalization/transformation, cleansing, and point-to-point integration techniques. Hence, we don't need to spend huge amounts of money and time on data cleansing, transformation, and normalization into a big "lake" until we can work with all that data from all those places.
In addition, our turnkey on-demand delivery model includes all the technical and data science resources to deliver the insights end to end. In conjunction with our machine learning and advanced automation we just make the data integration and data sciences happen - in a small fraction of the time for a small fraction of the costs compared to doing it all yourself.