PrestoDb: A open source distributed SQL query engine

January 04, 2017

Presto DB Power full Query Engine

Presto DB is an open source distributed query engine to run interactive SQL(analytics query) on Big-Data which can be gigabytes to terabytes or petabytes. Presto was designed for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

Presto allows to querying data from Hadoop HDFS, Hive, Cassandra, relational databases or even proprietary data stores. A single presto query can combine data from multiple sources.

The main goal of Presto to deliver analytics query result in sub-seconds to minutes on non-expensive hardware like hadoop cluster. It's fully free.

Facebook uses Presto for interactive query against several internal data stores, including their 300 PB data warehouse. I personally tried presto DB on 3 node cluster with the data size of 1 TB to 3 TB data which resides on Hadoop HDFS and got the awesome performance in sub-seconds(calculations) and in microsecond to list sample data. As per my hands on I found Presto DB is read fast.

I had a challenge to make Apache hive or Apache spark comparable with Oracle query execution time, I tried many things but didn't get success. Finally came to Presto db & Kyline and found the Presto DB installation and managements are so simple as well as Presto query engine executes query in second or in micro seconds....

Its very easy to run 1000's of query on Presto DB over huge data with multiple session.

It supports multiple connectors to connect applications to Presto DB like:

PHP connector : To connect with PHP UI
JDBC connector : To connect with JDBC application
Rest API : to connect with BI tools & restful services.

Comments

Add comment

Search This Blog

Hadoop Tutorial | Spark Kafka Nosql and BigData tools for DWH

PrestoDb: A open source distributed SQL query engine

Comments

Post a Comment

Popular posts from this blog

Setup Nginx as a Reverse Proxy for Thingsboard running on different port/server

How to auto re-launch a YARN Application Master on a failure.

Read JSON File in Cassandra