Invited Lecture - Tutorial: Mark C. Lewis

by admin last modified Jul 04, 2017 11:35 AM

Mark C. Lewis

Big Data Analytics with Spark
Mark C. Lewis
Trinity University, San Antonio, TX, USA
Date & Time: July 18 (Tuesday), 2017; 05:20pm - 06:20pm
Location: Ballroom 8


We live in a age of data, presenting us with the challenge of trying to find meaning in all of that data. Google's MapReduce, as embodied in the Hadoop implementation, ushered in the era of big data analytics by providing a standard system that allowed data to be analyzed across a cluster with good fault tolerance. Hadoop does this by storing results off to disk after each reduce step. This provides fault tolerance, but at a high cost to speed. The Spark framework sits in the Hadoop ecosystem as an alternative to straight MapReduce that performs more operations in memory, and thus can run much faster. Standard benchmarks have shown it performing as much as 100x faster than Hadoop on standard benchmarks. Attendees of this tutorial will be introduced to the Spark framework and its primary abstraction, the Resilient Distributed Dataset (RDD). We will run through a number of example problems showing how they can be solved using the operations provided by RDDs and the key concerns that need to be kept in mind to maintain performance. These examples will also show the use of the MLib machine learning library.


Mark Lewis has been in the Department of Computer Science at Trinity University since 2001. His courses tend to focus on aspects related to programming/programming languages, including web development, and simulation/scientific computing. He has been the lead author on over 30 papers spanning a range of topics from planetary ring dynamics in the journal Icarus to the SIGCSE annual conference proceedings. He is also the author of several textbooks using Scala published by CRC Press.

Filed under: ,