Analysis Tech stock sailed through an era-defining moment last week as recently IPO’d cloud data warehouser Snowflake surpassed IBM in market capitalisation.
Investors’ collective imagination is decidedly unflattering for Big Blue. Leaving aside its undeniable role in the development of modern computing, it has around 350,000 employees (although declining) to Snowflake’s 2,000.
The staggering $120bn Snowflake valuation offered pause for thought about this once-niche hole in the enterprise software market and the money flooding in has encouraged others to make a grab for it, claiming there is headroom for technical improvement.
Enter Firebolt, an Israeli firm co-founded by Eldad Farkash, the entrepreneur behind BI and analytics software company Sisense, which was touching a $1bn valuation on its last funding round.
Firebolt has just secured $37m funding itself. Farkash, also CTO, told The Register the company has developed its own secret-sauce triple-F files and sees the key to its technical advantage in generating a primary index as it creates tables.
“You do not need to cluster the data, we don’t do post-ingest clustering, we actually order the data while we ingest it,” he said. “Triple-F files are being automatically and transparently merged and compacted. The fact that they’re ordered allows us to apply sparse indexing to get extreme speed and coverage over the data that sits in S3, applying new sparse indexing in RAM and applying data pruning on S3 at a range level.”
The idea is a more granular scan of S3 data allowing more efficient pruning of data for analysis. “Your typical cloud-native data warehouse on S3 will consume terabytes or gigabytes of data. You start using Firebolt you’ll immediately see that the amount of data you download is much smaller,” Farkash said.
He claimed performance in use cases were “mind-blowing” and promised TPC-DS and TPC-H benchmarks next year.
But the minds of impartial observers were not blown by the proposition.
Andy Pavlo, associate professor of databaseology at Carnegie Mellon University, said Firebolt’s approach was not entirely novel and bore similarities to the zone-mapping one used by Snowflake to track minimum-maximum boundaries, but with a more granular form of indexing akin to the model taken by real-time analytics database Rockset, which indexes everything – overkill in Pavlo’s book.
With the caveat that Firebolt has not released the documentation, Pavlo said Firebolt’s combination of just-in-time compilation and the vectorised processing was potentially new from a technical standpoint. Most data warehouses do one or the other.
“We’ve done both in an academic system: it’s hard to do. But it is not clear how much they’re doing there,” he said.
Despite investors’ relatively recent obsession with Snowflake – it floated in September and was valued at $33bn – the cloud-native data warehouse market is mature, from a technical standpoint, giving established players considerable momentum over startups, no matter what their apparent technical advantages, Pavlo said.
“You’ve got Microsoft, Amazon and Google cloud data warehouse services. They have about a seven years’ head start. Then there is Snowflake. That can make it really hard to make inroads. That said, because analytical queries are not cheap, if price difference is a huge factor for people then I can see Firebolt making some inroads, but it will take time. And a lot of times, it’s not performance or price that people care about. Paying Snowflake $200,000 versus paying Firebolt $100,000, that can be a drop in the bucket.”
Meanwhile, the data warehouses from the cloud platform providers have not been standing still. Google, home of BigQuery, is strongly rumoured to have bought UK startup Dataform to help build tools to manage data flows in enterprise analytics. Google is yet to comment to The Register.
Not to be outdone, Microsoft touted the development of Azure Synapse Link, a “cloud-native implementation of hybrid transactional analytical processing” earlier this year.
But with its massive muscles in the cloud infrastructure market, AWS is the one to watch. Its Redshift data warehouse also had a facelift earlier this month.
Among the highlights is AQUA, which the Seattle cloud giant says offers a new hardware-accelerated cache. Rahul Pathak, AWS veep for analytics, told The Register that the service was transparent to customers and free of charge for those using S3 in Redshift.
The service dedicates processors to compression and encryption. While field-programmable gate array processors, in line with NVMe drives, were “working on operations like filtering and aggregation so they can do those at line rates, which are way faster than what CPUs can handle,” he said. “Essentially we’re pushing the compute, filter and aggregation operations down to the storage layer.”
Pathak claimed this offers a 10x increase on other cloud data warehouses on scan-heavy queries.
Also new to Redshift is Glue Elastic Views designed to help developers build applications that use data from multiple data stores with materialised views that combine and replicate data across storage, data warehouses, and databases. Meanwhile, AWS said that users can create queries and dashboards using natural language and its new QuickSight Q feature.
Other data warehouse vendors, such as Yellowbrick, have made a play in hardware optimisation. But while Yellowbrick also optimises performance using NVMe modules that contain flash memory, it does not use FPGAs.
Carl Olofson, IDC research vice president for data management software, said that as AWS is a cloud provider, it can control the hardware. “So, AQUA’s level of hardware assist is a level above [Yellowbrick’s].”
But there could be disadvantages to users tying their data warehouse strategy to their cloud provider, he added.
“If you are committing 100 per cent to one cloud platform, then cloud agnosticism does not matter. If, however, you are managing data, including operational data, on multiple clouds, as many larger enterprises may be doing due to departmental IT decisions, then having a single data warehouse technology for all that data regardless of cloud platform may be seen as advantageous.”
Although Firebolt might have some technical advantages, and play in the cloud-agnostic market-within-a-market, it may still struggle to gain wider traction, the IDC man said.
“Firebolt is most similar to Snowflake, and though they claim some advantages, may have a hard time establishing a superior marketing position since their architecture is very similar to Snowflake’s in that they both use hybrid columnar compressed data with single instruction, multiple data-powered vector processing,” Olofson said.
Although the money may be flooding into the cloud data warehouse and analytics market, incumbents are going to have a strong advantage at least for the time being. Technical advantages, even if they can be demonstrated, are not likely to open the door to the smaller players. ®