Keynote Presentation - Jimmy Lin
Location: | Western Infirmary Lecture Theatre (map: B9) |
---|---|
Time: | 10:40-11:40, day 1 |
Lessons Learned From Building Big Data Systems at Twitter
I spent an extended sabbatical at Twitter from 2010-2012 working on analytics infrastructure to support data science and services designed to surface relevant content to users. During this time, Twitter’s Hadoop data warehouse grew from dozens of nodes to tens of thousands of nodes across multiple datacenters, ingesting tens of terabytes daily from dozens of heterogeneous sources. I also had the opportunity to contribute to a variety of data products, including real-time search and Twitter’s WTF recommendation service.
In this talk, I will attempt to distill my experiences working on “big data” systems into a series of high-level “lessons learned”, and then try to connect these lessons to academic research. I’ll present a few interesting research directions and discuss my views about the evolving roles of academic and industrial research.
Prof Jimmy Lin
Prof Jimmy Lin is an associate professor in the iSchool at the University of Maryland, with appointments in the Institute for Advanced Computer Studies (UMIACS) and the Department of Computer Science. He joined the faculty in August 2004, shortly after completing a Ph.D. in Electrical Engineering and Computer Science at MIT, and was promoted to associate professor in March 2009.
Prof Lin works on “big data”, with a particular focus on large-scale distributed algorithms for text processing. His research lies at the intersection of natural language processing (NLP) and information retrieval (IR). He is a member of both the Computational Linguistics and Information Processing Lab (CLIP) and the Human-Computer Interaction Lab (HCIL).
From 2010-2012, Prof Lin spent an extended sabbatical at Twitter, where he worked on services designed to surface relevant content to users and analytics infrastructure to support data science. Prof Lin has also consulted for Cloudera, an enterprise Hadoop company, where he was largely responsible for starting Cloudera’s Hadoop training and certification programs.
Prof Lin’s popular textbook “Data-Intensive Text Processing with MapReduce” introduced the notion of “design patterns” for MapReduce algorithms and serves as a starting point for many subsequent innovations in the field.