E WuEugene Wu

Ph.D Student, MIT

"Closing the Loop on Data Analysis"

Abstract: Although data processing systems now execute queries faster than ever before, they only address first half of the data analysis cycle. The latter half -- presenting and interpreting the results in order to clean the data, formulating new queries, generating hypothesis, and summarizing and presenting results -- is currently ill-served by existing systems. In this talk, I will describe two examples of systems that "close the loop" by letting users query the results of their data analysis.

The first, Scorpion, answers "why are these results outliers?" in the context of aggregation queries. Aggregation is commonly used to reduce large data sets to a managable size, but also obscures the input records that are correlated with outliers from those that are uncorrelated. Scorpion identifies the input records that most contributed to an outlier value and generates predicates that describe their common properties.

Bio: Eugene Wu is a Ph.D. student in the database group at MIT, advised by Samuel Madden and Michael Stonebraker. He is broadly interested in building systems for data management and has contributed to research in a wide variety of areas including data cleaning, core database performance, human computation, and complex event processing.


Hosted by: EECS Prof. Fabian Bustamante

Tuesday, March 11, 2014 at 11:00am
