Metadata Management on Data Processing in Data Lakes
Résumé
Data Lake (DL) is known as a Big Data analysis solution. A data lake stores not only data but also the processes that were carried out on these data. It is commonly agreed that data preparation/transformation takes most of the data analyst's time. To improve the efficiency of data processing in a DL, we propose a framework which includes a metadata model and algebraic transformation operations. The metadata model ensures the findability, accessibility, interoperability and reusability of data processes as well as data lineage of processes. Moreover, each process is described through a set of coarse-grained data transforming operations which can be applied to different types of datasets. We illustrate and validate our proposal with a real medical use case implementation.
Domaines
Base de données [cs.DB]
Fichier principal
Data_Lake__Metadata_Management_on_Data_Processing___short_paper_SOFSEM.pdf (989.64 Ko)
Télécharger le fichier
Origine | Fichiers produits par l'(les) auteur(s) |
---|