Real-time analytical database slinger Rockset has introduced SQL transforms to livestreaming along with a method for rolling up data it claimed will offer users a reduction in the cost of storage and querying.
Database experts have said cost reductions would depend entirely on the use case but welcomed the introduction of the omnipresent query language to the streaming database world.
Rockset is a commercial database based on RocksDB. It employs the document model with a secondary RDBMS and calls itself a “real-time indexing database.” That means the database indexes all the data coming into the system in real time, allowing for a one to two-second lag, and all of that data is then visible to queries, applications, and dashboards.
First on the list of new features for the document-store database is support for SQL transforms on streaming data as it is ingested, which would “eliminate time and effort required to maintain complex real-time data pipelines,” the company said.
“Sometimes when the metrics cannot be directly calculated, and you have to transform them – timestamps coming in strings, for example. Because you’re simply using SQL, you can do that transformation of your data as it comes in, not just before or after aggregation,” Rockset co-founder and CEO Venkat Venkataramani told The Register.
Meanwhile, Rockset is offering the ability to use SQL to pre-aggregate streaming data as it is ingested, which it has described as “rollups”, and claimed reduce the cost of storing and querying data by 10 to 100 times.
“Instead of storing all the rate raw data and doing all sorts of very expensive batch analytics, rollups allows you, at the data streaming time, to set up all of your dimensions and metrics using a SQL query,” Venkataramani said.
He added that Rockset had built SQL query engine in C++ from scratch, but instead of making it work on tables “we had to teach the SQL engine to make it work on streams.”
Andy Pavlo, associate professor of databaseology at Carnegie Mellon University, said that by transforming data on the livestream means that users only need to query a subset of data in the stream: transformations allow them to filter out data before it hits Rockset’s storage and indexing engine.
“This will then improve query performance on this data because the DBMS has to look at less data,” said Pavlo, who is also founder and CEO of OtterTune, a database tuning system spun out of a university project.
“Allowing their users to define these pipelines via SQL makes sense. It’s 2021 and I don’t think a company should waste people’s time with forcing them to learn another query language to use their product when everyone already knows SQL.”
Cost savings resulting from “roll-ups” were impossible to predict without specific use cases, he said, as it would depend on the workloads and queries. “But there obviously should be a reduction in user costs if they end up storing less unnecessary data in Rockset.”
IDC research manager Amy Machado said transforming data in-stream could make implementations easier for data and software engineers. “We are seeing a shortage of developers coupled with a long list of application development requests. Removing complexity with standard SQL queries off of Kafka — without having to touch Kafka — could help the enterprise boost deployment time on streaming use cases. Someone who knows SQL can now enable continuous queries instead of batch or one-time queries, as they would in a data-at-rest relational database.”
Still, Rockset does not have the market to itself. The product would compete with Confluent’s ksqlDB platform, which began as KSQL back in 2017, Machado said.
“The streaming data market is relatively immature where tech investments have been greatly skewed to batch data processing. As the demand for streaming data use cases grows, so will the numbers of vendors in this space, and Rockset is jumping in by adding the stream processing layer to its cloud-based analytics platform.”
Use cases exist in every vertical including database triggers, change data capture, clickstreams, or IoT sensor data. “Companies that focus on low-code entry points and helping companies avoid the intense labour that comes from writing data pipelines will benefit in this emerging market,” Machado said. ®