2023 Review and Looking Ahead to 20242023 Review and Looking Ahead to 2024

2023 Review and Looking Ahead to 2024

Date
February 13, 2024
Author
Ehsan Totoni

2023 was a pivotal year for Bodo’s mission to bring High Performance Computing levels of efficiency, performance, and scalability to data workloads. From building a vectorized SQL engine to a new cloud service architecture, there are several milestones to celebrate. Let’s review some of the highlights of 2023 and look ahead to possibilities in 2024.

Bringing vectorized SQL execution to Bodo’s high performance compute engine

Vectorized execution is a key SQL engine optimization where small batches of rows are read from storage incrementally and streamed through operators. Vectorized execution is necessary for efficient CPU resource use as well as effective memory management to avoid out-of-memory errors. Bodo is the first engine to integrate vectorized execution and high performance computing (MPI parallelism and advanced compiler optimizations), bringing exceptional efficiency to large-scale data workloads.

This new SQL engine architecture has three key components:

  1. Streaming execution: SQL operators, including join and group-by now take batches of rows as input instead of whole tables. In addition to lowering memory consumption and improving performance, streaming execution is critical to effective memory management and spilling to disk.
  2. Spill to disk: Bodo engine can now avoid out-of-memory errors by spilling necessary state to disk without prohibitive performance penalties. We developed an advanced memory manager to enable high reliability and performance.
  3. SQL plan optimizer: high level SQL execution plan optimization is a key component of efficiency which is also required for effective vectorized execution. Bodo’s SQL planner now incorporates advanced cost-based optimizations to minimize runtime and memory consumption.

With this new architecture, Bodo’s SQL engine incorporates capabilities of leading data warehouse engines in addition to its unique high performance computing technologies.

Snowflake SQL compatibility

Compatibility with Snowflake’s SQL dialect is critical for customers who want to plug-in the Bodo engine in their Snowflake-based stacks and create a multi-engine environment without lengthy and expensive code migration. We made massive strides towards Snowflake SQL compatibility in 2023 and now a large portion of customer queries work on both Snowflake and Bodo without any code changes! This includes compute functions, date and time operations, cast syntax and semantics, semi-structured operations, and more.

Bodo’s integration with Snowflake was also significantly enhanced. For example, Bodo can now automatically inline Snowflake views in the query, leading to easier migration and much better performance. In addition, Bodo’s Snowflake connector is much more robust in performing read and write operations on Snowflake native tables.

Semi-structured data support

A popular feature of Snowflake is support for semi-structured data, where each column element could be an array or a multi-attribute object instead of a scalar value. Semi-structured features are very popular, but they make Snowflake queries slow and expensive. Bodo now supports many semi-structured operations without requiring any changes to the existing code, yet delivering enhanced performance and efficiency.

A fundamental challenge is Snowflake’s VARIANT data type where the data type is not stored in the schema. This makes it difficult for engines outside Snowflake to handle this type. In many cases, Bodo can now infer the data types automatically and run expensive semi-structured queries efficiently. In addition, Snowflake has recently added strong typing support for semi-structured data, which is necessary for Iceberg and will avoid the inefficiencies with VARIANT columns.

Reliable and secure cloud architecture

We also revamped our cloud platform’s architecture to enhance reliability and security. The Bodo platform now uses cluster agents to manage and monitor reliable job execution. In addition, the connections between the Bodo control plane to the data plane in the customer account are now minimal and more secure. We also revamped our Jupyter notebook architecture to be much faster to load and much more reliable.

The Bodo platform also underwent significant stress testing and reliability improvements. It gained many features, such as spot instance support and automatic availability zone selection. Spot instances provide up to 90% cloud cost savings compared to on-demand instances.

Solution engineering and onboarding

We also made large strides in our solution engineering and customer onboarding process. We automated and enhanced many onboarding tasks, such as analyzing customers’ Snowflake query footprint and estimating potential savings. We also streamlined the query deployment process to be able to allow plugging in Bodo in the environment with little effort.

Don’t forget about Python!

Although much of our effort was focused on SQL, we still believe Python is very important for advanced data applications. We expanded Bodo’s coverage of Numpy operations and also added support for efficient parallel FFTs.

Looking ahead to 2024

2024 is a very exciting year for Bodo where we think our multi-engine vision will finally realize its potential! In particular, customers will be able to use the Bodo and Snowflake engines interchangeably to run the same queries on the same data by just flipping a switch.

Apache Iceberg’s rapid adoption by customers and vendors like Snowflake is critical to enabling multi-engine environments. Bodo already supports Iceberg in Python, and we also have a robust roadmap for Iceberg support in SQL. This includes efficient read, write, update, delete, and merge operations—and features like semi-structured data. We will provide more details as we make more progress in our Iceberg journey.

In addition to Iceberg, we have major plans to advance in other areas such as connectors (e.g., JDBC and dbt), Snowflake SQL support, performance enhancements, reliability, and more.

Ready to see Bodo in action?
Sign up now for our free SDK trial

Let’s go