Titian: data provenance support in spark - vldb endowment

5 pages - 2.4 MB
By m interlandi · 2015 · cited by 118 — titian in- tegrates with the spark programming interface, which is based on a resilient distributed dataset (rdd) abstrac- tion defining a set of ...
Document in text mode:
Titian:DataProvenanceSupportinSparkMatteoInterlandiKshitijShahSaiDeepTetaliMuhammadAliGulzarSeunghyunYooMiryungKimToddMillsteinTysonCondieUniversityofCalifornia,LosAngelesABSTRACTDebuggingdataprocessinglogicinData-IntensiveScalableComputing(DISC)systemsisadifficultandtimeconsum-ingeffort.Today’sDISCsystemsofferverylittletoolingfordebuggingprograms,andasaresultprogrammersspendcountlesshourscollectingevidence(e.g.,fromlogfiles)andperformingtrialanderrordebugging.Toaidthiseffort,webuiltTitian,alibrarythatenablesdataprovenance—trackingdatathroughtransformations—inApacheSpark.DatascientistsusingtheTitianSparkextensionwillbeabletoquicklyidentifytheinputdataattherootcauseofapo-tentialbugoroutlierresult.TitianisbuiltdirectlyintotheSparkplatformandoffersdataprovenancesupportatinteractivespeeds—orders-of-magnitudefasterthanalterna-tivesolutions—whileminimallyimpactingSparkjobperfor-mance;observedoverheadsforcapturingdatalineagerarelyexceed30%abovethebaselinejobexecutiontime.1.INTRODUCTIONData-Intensive...
Relevant files:
Linon home decor products, inc.
In renaissance venice, titian was prince of painters
The reception of titian in britain, c. 1769-1877: artists, collectors, critics
Integrating weighing into your sample management using titian's ...
Reading a “titian”: visual methods and the limits of interpretation
Member of ssa for over 25 years and has been a tremendous asset to the ...