Interactive querying on Hadoop: an introduction to Impala
The Cloudera Impala project is for the first time making scalable parallel database technology, which is the underpinning of Google's Dremel as well as that of commercial analytic DBMSs, available to the Hadoop community. With Impala, the Hadoop community now has an open-sourced codebase that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators.
This talk will start out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation, and will conclude with a comparison of Impala with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure.
Marcel Kornacker is a tech lead at Cloudera for new products and creator of the Cloudera Impala project. He graduated in 2000 with a PhD in databases from UC Berkeley, followed by engineering jobs at a few database-related startup companies. Marcel joined Google in 2003, where he worked on several ads serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google's F1 project.
This event is brought to you by BeJUG in collaboration with bigdata.be and NGDATA
||welcome reception with food and drinks (offered by NGDATA)
||a brief introduction into the Hadoop ecosystem (bigdata.be)
||Impala introduction with Marcel Kornacker (+ Q&A)
- February 25th from 18h00 till 21h00