Interactive querying on Hadoop: an introduction to Impala

Abstract

The Cloudera Impala project is for the first time making scalable parallel database technology, which is the underpinning of Google's Dremel as well as that of commercial analytic DBMSs, available to the Hadoop community. With Impala, the Hadoop community now has an open-sourced codebase that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators.

This talk will start out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation, and will conclude with a comparison of Impala with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure.

Speaker's Bio

Marcel Kornacker is a tech lead at Cloudera for new products and creator of the Cloudera Impala project. He graduated in 2000 with a PhD in databases from UC Berkeley, followed by engineering jobs at a few database-related startup companies. Marcel joined Google in 2003, where he worked on several ads serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google's F1 project.

This event is brought to you by BeJUG in collaboration with bigdata.be and NGDATA

Agenda

6:00-6:45 PM welcome reception with food and drinks (offered by NGDATA)
6:45-7:30 PM a brief introduction into the Hadoop ecosystem (bigdata.be)
7:30-7:45 PM break
7:45-9:00 PM Impala introduction with Marcel Kornacker (+ Q&A)
9:00-10:00PM open bar

Date

  • February 25th from 18h00 till 21h00

Location

Adaptavist Theme Builder Powered by Atlassian Confluence