Invited Talk/Tutorial: Big Data Analytics with Spark

by admin last modified May 05, 2019 07:45 PM

Mark C. Lewis

Big Data Analytics with Spark

Prof. Mark C. Lewis

Trinity University, San Antonio, Texas, USA

Date & Time: TBA
Location: TBA
 

Abstract

We live in an age of data, presenting us with the challenge of trying to find meaning in all of that data. Google's MapReduce, as embodied in the Hadoop implementation, ushered in the era of big data analytics by providing a standard system that allowed data to be analyzed across a cluster with good fault tolerance. Hadoop does this by storing results off to disk after each reduce step. This provides fault tolerance, but at a high cost to speed. The Spark framework sits in the Hadoop ecosystem as an alternative to straight MapReduce that performs more operations in memory, and thus can run much faster. Standard benchmarks have shown it performing as much as 100x faster than Hadoop on standard benchmarks. Attendees of this tutorial will be introduced to the Spark framework and the Machine Learning library that is part of it. We will run through a number of example problems showing how they can be solved using Spark and the key concerns that need to be kept in mind to maintain performance. These examples will also show the use of the MLib machine learning library.

 

Biography

Mark Lewis has been in the Department of Computer Science at Trinity University since 2001. His courses tend to focus on aspects related to programming/programming languages, including web development, and simulation/scientific computing. He has been the lead author on over 30 papers spanning a range of topics from planetary ring dynamics in the journal Icarus to the SIGCSE annual conference proceedings. He is also the author of several textbooks using Scala published by CRC Press.

Filed under: , ,