[Syscussion] Next syscussion: Titian: Data Provenance Support in Spark
Evie Kassela
evie at cslab.ece.ntua.gr
Thu Feb 15 16:51:46 EET 2018
Kαλησπέρα,
Το paper που θα παρουσιαστεί στο αυριανό syscussion είναι:
*Title:* Titian: Data Provenance Support in Spark
*Authors:* Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad
Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, and Tyson Condie
*Abstract:* Debugging data processing logic in Data-Intensive Scalable
Computing (DISC) systems is a difficult and time consuming effort.
Today’s DISC systems offer very little tooling for debugging programs,
and as a result programmers spend countless hours collecting evidence
(e.g., from log files) and performing trial and error debugging. To aid
this effort, we built Titian, a library that enables data
provenance—tracking data through transformations—in Apache Spark. Data
scientists using the Titian Spark extension will be able to quickly
identify the input data at the root cause of a potential bug or outlier
result. Titian is built directly into the Spark platform and offers data
provenance support at interactive speeds—orders-of-magnitude faster than
alternative solutions—while minimally impacting Spark job performance;
observed overheads for capturing data lineage rarely exceed 30% above
the baseline job execution time.
Παρουσιάστηκε στο VLDB 2016.
Eύη
--
Evdokia Kassela
PhD Candidate
Computing Systems Laboratory, School of ECE
National Technical University of Athens
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cslab.ece.ntua.gr/pipermail/syscussion/attachments/20180215/47694287/attachment.htm>
More information about the Syscussion
mailing list