Amalur
The Convergence of Data Integration an Machin Learning
About me
I'm currently a postdoctoral researcher at the Web Information Systems group at Delft University of Technology, actively seeking employment opportunities.
I am a PhD student at the Web Information Systems group, Department of Software Technology, Faculty of EEMCS, Delft University of Technology. The PhD project belongs to the HyperEdge project with Cognizant. I’m supervised by Alessandro Bozzon and Asterios Katsifodimos.
My research lies at the intersection of Machine Learning and data analysis. In particular, my reserach investigates how to apply metadata of different artifacts (e.g., model, data, hardware settings) to improve the effectiveness and efficiency of machine learning workflows.
Ziyu Li , Hilco van der Wilk, Danning Zhan, Megha Khosla, Alessandro Bozzon, Rihan Hai
In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
Ziyu Li, Wenbo Sun, Danning Zhan, Yan Kang, Lydia Chen, Alessandro Bozzon, and Rihan Hai.
IEEE Transactions on Knowledge and Data Engineering (2024).
Rihan Hai, Christos Koutras, Andra Ionescu, Ziyu Li, Wenbo Sun, Jessie van Schijndel, Yan Kang, Asterios Katsifodimos
In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023 p.3729-3739, IEEE .
Ziyu Li, Henk Kant, Rihan Hai, Asterios Katsifodimos, Marco Brambilla, Alessandro Bozzon
In IEEE Access Volume 11 p.125616-125630 (2023).
Ziyu Li, Rihan Hai, Asterios Katsifodimos, Alessandro Bozzon
In International Conference on Web Engineering p.376-380. (2023)
Ziyu Li, Wenbo Sun, Rihan Hai, Alessandro Bozzon, Asterios Katsifodimos
In International Conference on Web Engineering p.51-66 (2023).
Ziyu Li, Mariette Schonfeld, Rihan Hai, Alessandro Bozzon, Asterios Katsifodimos
In Proceedings - 2023 IEEE 39th International Conference on Data Engineering Workshops, ICDEW 2023 p.74-78, Institute of Electrical and Electronics Engineers (IEEE) (2023)
Xiu Xiu Zhan, Ziyu Li, Naoki Masuda, Petter Holme, Huijuan Wang
In EPJ Data Science Volume 9 (2020).
Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph’s effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods
We propose a method for optimizing ML inference queries that selects the most suitable ML models to use, as well as the order in which those models are executed. We formally define the constraint-based ML inference query optimization problem, formulate it as a Mixed Integer Programming (MIP) problem.
The metadata serves crucial roles for reporting, auditing, ensuring reproducibility, and enhancing interpretability. Despite the growing adoption of descriptive formats like datasheets and model cards, the metadata available in existing model zoos remains notably limited. Moreover, existing formats have limited expressiveness, thus constraining the potential use of model repositories, extending their purpose beyond mere storage for pre-trained models.
This paper proposes a unified metadata representation format for model zoos. We illustrate that comprehensive metadata enables a diverse range of applications, encompassing model search, reuse, comparison, and composition of ML models. We also detail the design and highlight the implementation of an advanced model zoo system built on top of our proposed metadata representation
The Convergence of Data Integration an Machin Learning