• Rather than try to predict wide-scale market trends, we share five very specific 2020 “wishes” for enterprise data and analytics practitioners.
• Our list centers on cross-cloud data portability, the benefits of structured query language, AI software development, and the death of business intelligence.
The deepening absence of daylight and the seemingly consistent sprinkle of snow now falling gently across the northern hemisphere heralds many annual customs — imbibing eggnog and other odd drinks, lighting holiday lights, visiting family, and of course, forecasting the future.
For we industry analysts, anticipating next year’s trends is an enjoyable but in no way a precise endeavor. Unfortunately, the current breakneck pace of innovation we enjoy as well as the persistent, erroneous bias that AI makes everything predictable, takes the shine off this annual venture. In other words, it’s hard to be both interesting and right over any kind of period, let alone 12 months. In response to these existential ramblings, for my traditional year-end predictions, please allow me to share with you my top five 2020 “wishes” for enterprise data and analytics practitioners.
- Cross-Cloud Data Portability: Certainly in terms of storing a lot of dynamic data at scale, cloud-borne object cloud storage solutions like Amazon Simple Storage Service (S3) have a lot to offer and have quickly taken up the reins as the data lake of choice for many customers. Such data repositories, however, are anything but open, locking customers into the host platform provider. I would like, therefore, to see some more effort from these cloud providers to open up their important data stores a bit more. Interoperability efforts like Microsoft’s AzCopy (which copies S3 buckets to Azure Blob Storage) are important, but what the industry needs are more third party options like MinIO Object Storage to serve as a true layer of compatibility spanning multiple object stores.
- SQL, Here to Stay! When it comes to data science, the Python language reigns supreme, topping pretty much every independent study (e.g., TIOBE Index and PYPL PopularitY Index). This is due in no small part to the extreme likeability of this language as well as its even more popular data-friendly libraries, Numpy and Pandas. But we cannot forget the original data wrangling language and tabular data store, SQL. Sure, SQL databases have gotten quite a bad rap next to NoSQL databases, they have their place. While Python does an excellent job working with data right within Python, SQL can work on data where it belongs, in a database. For extremely large-scale transformations, this can mean the difference between waiting a few seconds for the results and having to restart your Python environment. So, let’s keep SQL (as well as R) going strong. The more tools in the toolkit, the better.
- AI and DevOps Convergence: The accelerated use of AI across the technology landscape has reached a point where AI can no longer be considered an isolated enterprise endeavor. AI technologies, be those data pipelines, AI frameworks, development tools, platforms, or even AI-accelerated hardware, are all readily available and largely operationalized for widespread and rapid enterprise adoption. Unfortunately this embarrassment of riches brings with it a host of AI-specific operational complexities that get in the way of the DevOps ideal of continuous deployment, delivery, and integration. What we need are more tools like AWS Sagemaker, which has been using Lambda step functions and container management to shuffle machine learning (ML) models between production and development for some time. It should be noted that recently (December 3, 2019), AWS took another important step toward what can be called MLOps with the introduction of a fully integrated development environment, Amazon SageMaker Studio.
- Data Intelligence Over Business Intelligence (BI): BI has served as a core enterprise competency for more than 40 years, but lately it has shown its age in failing to keep up with the speed and variety of data that must be analyzed in order to drive meaningful business decisions. Lightweight data visualization tools have done much to modernize BI, putting analytics into the hands of a wide array of decision makers. But even those have failed to keep pace and have often created more problems than they solve by encouraging the free use of personal, ungoverned data sources. This has led to an intense focus on data management and governance. Vendor communities (BI/data visualization, data integration, and cloud platform) will need to bring to market a better way for enterprise customers to prioritize the front end of the data and analytics pipeline — specifically the ingestion and registration of data. In short, we need a better, more flexible means of ingesting, registering, validating, and distributing data sources. For that we need more tools like Microsoft Azure Data Catalog and Tableau Data Catalog, which can bring the focus back to the front end of the pipeline without enforcing any draconian data warehousing requirements.
- AI-Specific Development Languages: AI has reached a point of ubiquity where both developers and end users alike expect all software to employ some facet of AI augmentation or automation. For enterprise software developers this has historically necessitated working with what amounts to an impenetrable black box, wherein AI-driven decisions are made that ultimately influence how their software runs. Thanks in part to regulatory pressures, these black boxes are becoming less opaque to both users and developers. But they remain boxes nonetheless, separate from the programmatic outcomes they drive. What we need, therefore, are technologies that let AI play much more active and direct role within enterprise software development itself. A good example of this can be found in probabilistic software development, as with Google Edward and Uber Pyro. These new languages incorporate AI inference capabilities into the logic of an application, using advanced capabilities such deep learning (DL) to predict a probabilistic outcome for a given program state using the current state of the program itself.