[Syscussion] Next syscussion: Titian: Data Provenance Support in Spark

Evie Kassela evie at cslab.ece.ntua.gr
Thu Feb 15 16:51:46 EET 2018


Kαλησπέρα,

Το paper που θα παρουσιαστεί στο αυριανό syscussion είναι:

*Title:* Titian: Data Provenance Support in Spark
*Authors:* Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad 
Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, and Tyson Condie
*Abstract:* Debugging data processing logic in Data-Intensive Scalable 
Computing (DISC) systems is a difficult and time consuming effort. 
Today’s DISC systems offer very little tooling for debugging programs, 
and as a result programmers spend countless hours collecting evidence 
(e.g., from log files) and performing trial and error debugging. To aid 
this effort, we built Titian, a library that enables data 
provenance—tracking data through transformations—in Apache Spark. Data 
scientists using the Titian Spark extension will be able to quickly 
identify the input data at the root cause of a potential bug or outlier 
result. Titian is built directly into the Spark platform and offers data 
provenance support at interactive speeds—orders-of-magnitude faster than 
alternative solutions—while minimally impacting Spark job performance; 
observed overheads for capturing data lineage rarely exceed 30% above 
the baseline job execution time.

Παρουσιάστηκε στο VLDB 2016.

Eύη

-- 
Evdokia Kassela
PhD Candidate
Computing Systems Laboratory, School of ECE
National Technical University of Athens

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cslab.ece.ntua.gr/pipermail/syscussion/attachments/20180215/47694287/attachment.htm>


More information about the Syscussion mailing list