Simplify and Accelerate Data Science at Scale with Bodo

Zhuchang Zhan

Today, the field of data science has seen tremendous growth and Python has become the de facto language for connecting data science. But due to Python’s limitations, data scientists moving from tech demos and toy examples to production level models will often face a steep learning curve of scaling their code. To be production-ready often requires rewriting a significant portion of the native Python (NumPy, Pandas, Scikit-Learn, etc) source code in another architecture. The consequence of this bottleneck not only hinders the development of the production cycle but also can lead to companies being outcompeted by the rapidly changing data science landscape. But what if you can scale linearly and reliably, in vanilla Python?

Modern data science and the new power of AI are prevalent in the business landscape. However, many firms and institutions are quickly discovering the challenge inherent in scaling the infrastructure to handle an exponentially increasing amount of data. Existing solutions are cumbersome, costly, slow, and error-prone. A fundamental problem is the segregation of environments between development (data science) and production (IT) teams. Data scientists usually write Python code on local workstations or small development clusters for productivity reasons, while IT teams rewrite the code in Java, Scala or C++, using technologies such as Apache Spark, to achieve performance, scalability, and reliability on production clusters.

chart The hurdles of transiting from existing development workflows to production workflows.

The Bodo architecture simplifies the infrastructure by unifying the production and development environments through the use of a single source code, software stack, and dataset. At the core of this solution is the Bodo analytics engine, which optimizes, parallelizes, and scales Python analytics automatically. The engine uses open-source Numba* Just-In-Time (JIT) compiler technology (based on LLVM*), and integrates with high-performance computing (HPC) technologies. Hence, the data scientists can take full advantage of Python flexibility and capabilities, while exploiting advanced high-performance features and the IT teams can avoid code rewrites and redundant setups, and can focus on serving more high-value AI projects seamlessly.

arch The Bodo architecture allows for the deployment of native Python frameworks to scale seamlessly on cloud infrastructures without the need to rewrite development code.

The Bodo architecture solution enables productive data science, more accurate insights, and cost-effective infrastructure through unparalleled efficiency, and is optimized to achieve real-time insights from Cloud to Edge. In addition, the Bodo architecture offers the following three key features: (i) the capacity of running different workloads on different cloud and on-premises environments, (ii) the liquidity of being fully portable and not tied to any particular setting, and (iii) the flexibility of adapting to the emerging complexity of future workloads, and the requirements of the analytics economy.

With Bodo, enterprises IT would be able to simplify their infrastructure by unifying the production and development environments through the use of a single source code and software stack. For more information on the technicality of Bodo, see here.

To see a demo of Bodo in action, see here. You can also reach out to me with any questions at zhuchang@bodo.ai.