Update Rockset: Fast SQL Querying on Raw Data

Main Contents:

Rockset: Fast SQL Querying on Raw Data is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Rockset: Fast SQL Querying on Raw Data in today’s post !

SQL Power without Pain

SQL is the database of choice for a majority of Big Data applications, but querying unstructured data on SQL remains painful, Peter Bailis, assistant professor of computer science at Stanford University writes in a blog post.

“Querying an unstructured data source using SQL for use in analytics, data science, and application development requires a sequence of tedious steps: figure out how the data is currently formatted, determine a desired schema, input this schema into a SQL engine, and finally load the data and issue queries,” he wrote. “This setup is a major overhead, and this isn’t a one-time tax: users must repeat these steps as data sources and formats evolve.”

The Rockset answer to this is to develop its own storage and indexing technology built atop RocksDB. Founded in 2016, the San Mateo, Calif.-based company recently came out of stealth and announced an $18.5 million Series A, led by Sequoia and Greylock, on top of $3 million in seed money raised earlier.

Offered as a SaaS product, Rockset is a serverless search and analytics engine that combines the power of search engines with columnar databases, providing provides fast SQL on diverse data. It relies of strong dynamic typing and indexing to make that happen. And it takes advantage of cloud auto-scaling to provide cost efficiency.

Rockset does not require upfront schema definition or data denormalization since it handles semi-structured data formats such as JSON, Parquet, XML, CVS, TSV by indexing and storing them in a way that can support relational queries using SQL, according to a white paper outlining its architecture.

Data from Anywhere

It can ingest data from real-time streams, data lakes, databases, and data warehouses without building pipelines. Rockset continuously syncs new data as it comes in without the need for a fixed schema.

It is optimized for key-value, time-series, document, search, aggregation and graph type queries. The Rockset query optimizer uses a hybrid of rule-based and cost-based optimizations employing machine learning to learn a customer’s query patterns and make them more efficient.

“We store the data in our own proprietary format, our own way of sorting the data,” Venkataramani said. “We take a complex data set and shred it into a whole bunch of little pieces and organize that in our back end in a way that we can power very fast SQL processing on top of that. … Right now, a lot of the processing is happening at write time ….where you need to handle all these edge cases of data preparation before the data is loaded into a database. We move that to the query processing without sacrificing performance or scale.”

Elasticsearch and other search-based processing systems use similar approaches, he said.

“You can turn single, semi-structured data streams in Elasticsearch and build applications on top of that. But at Rockset, we take it to a whole ‘nother level. We are built for the cloud so there’s a lot of elasticity, not just in indexing. We give you full-feature SQL so you can build complex applications that need joins and aggregations and the much more sophisticated processing that SQL systems can do.”

Rockset uses a microservices architecture using containers and Kubernetes with a cloud-agnostic approach. It employs RocksDB-Cloud as an embedded storage engine, along with a custom resource scheduler and custom C++ query processing engine. Ingestion and querying are auto-scaled separately based on limits set by the user.

Though designed to be cloud-agnostic and can be run on any cloud, so far all of Rockset’s services are run and hosted on AWS and follow AWS security practices.

Venkataramani sees uses for Rockset in personalization engines, IoT, security analytics and other real-time applications.

“You could easily point Rockset at a Kafka topic, and you would get a very fast SQL table on the other end to query and build applications on top of,” he said. “Data scientists really like this because they can run a lot of experiments, test a lot of hypotheses and then go into production with it because the SQL processing part of Rockset is at production speed. You don’t need to stand up more downstream serving engines to build your application on top of Rockset.”

Rockset does not support OLTP and the company has no plans to address transaction processing anytime soon, Venkataramani said.

“Want to focus on scenarios where the data is being produced in one application but being consumed by somebody else,” he said. “That’s where OLTP applications fall short. They’re very good at serving the data stored in them, but they’re not optimized to serve data generated elsewhere. That’s where we shine. We can build operational applications on any data set, and it does not have to be fully managed.”

Feature Image: “1a_09_2045UrukÉcriturePréCunéiforme” by Claude Valette. Licensed under CC BY-SA 2.0.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.