The ability to extract value from data is more urgent than ever for all major businesses. However, according to Gartner, over 85% of data science projects fail. Bodo aims to solve this problem by eliminating key hurdles in the application development process.
Billions of dollars of capital investments in AI/ML platforms aim to simplify analytics for data teams, yet enterprises struggle to gain the insights they need from their data assets. The fundamental problem? Existing technologies do not empower most data scientists to develop applications at scale. Keeping pace with rapidly changing business environments requires one to quickly:
Analytics applications are data-hungry and need to scale to large datasets. This requires learning and applying some form of parallel programming for compute clusters. Usually, data scientists are experts in their business domains but not in high-performance computing - resulting in a significant skill mismatch. It is simply too much to ask of a data scientist.
The ability to parallelize programs automatically is a key component and driver of data science productivity at scale. Bodo is the first platform that provides automatic-parallelization and High-Performance Computing (HPC) capabilities for analytics applications. This democratization of HPC allows data scientists to focus on solving the problem instead of rewriting their Python code in various languages and parallel libraries such as Apache Spark and SQL. Previous attempts have focused on building new parallel libraries with APIs similar to data science ones. However, they still require managing parallelism at the application level, even if they claim to be "drop-in replacement". In contrast, Bodo provides the first practical compiler auto-parallelization algorithm, which had been an elusive holy grail in computer science for decades.
Auto-parallelization means that data scientists can design, develop, and test their code as though the code is serial, and Bodo incorporates parallelization transparently at runtime. We have achieved this by 1) focusing on standard data science APIs in Python and treating them as first class programming languages and, 2) building on available LLVM, Python, and HPC technologies.
In addition to the slowdown of Moore's law1, this parallelism technology gap has led to complex solutions and processes in the enterprise. Most such solutions and processes tend to "glue" together packages to create applications. An unintended consequence of this is described in a paper by Google2 which argues that even mature analytics systems might end up being (at most) 5% machine learning code and (at least) 95% glue code. These glue codes, pipeline jungles, and re-writing of the native Python code are at the root of the increasing complexity and cost for businesses. Enterprises need solutions that deliver simplicity, agility, performance, efficiency, and lower aggregate cost at the same time. We believe that analytics will become pervasive in all enterprises that even those with modest programming backgrounds should be able to mine and extract meaning out of data, like any natural resource.
Bodo offers a universal analytics data optimization engine this is different from a library. Bodo utilizes the simplicity of the lingua franca of Data Science (Python) as well as the scalability and efficiency of HPC architecture (with MPI). This closes the "Productivity-Performance" gap by providing data science applications, the same architecture that is used in the most powerful supercomputers today. The results are unprecedented productivity, performance, and scalability with lower infrastructure costs at every level of the enterprise stack.
While our vision is to be the Data Optimization Engine for all workloads and personas, our singular focus is enabling a performant, efficient, and productive engine for Pythonic workloads. We have developed our initial solution on CPUs, and we have plans to port and optimize on GPUs and FPGAs soon. Technology has always evolved toward the path of least resistance: simplicity, performance, and cost-effectiveness. Bodo intends to be a force in that evolution, but we cannot do it alone. We are continuously engaging and learning from customers and partners. We will actively participate, contribute, and listen to Data Science and open source communities to guide our direction in this evolution. Follow us at bodo.ai for future plans and roll out.
1: Moore, Gordon. "The Future of Integrated Electronics." Fairchild Semiconductor internal publication (1964).
2: D. Sculley et al. "Hidden technical debt in Machine learning systems." (In proceedings NIPS'15).