Projet de recherche doctoral numero :8075


Date depot: 11 mars 2021
Titre: Designing Big Data Frameworks for Quality-of-Data Controlling in Large-Scale Knowledge Graphs
Directeur de thèse: Carlos Faouzi BADER (LISITE)
Encadrant : Rafael ANGARITA AROCHA (Inria-Paris (ED-130))
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini

Resumé: This project is in collaboration with Dr. Hassan HARB and Dr. Hussein HAZIMEH from the Lebanese University. CONTEXT Knowledge Graphs (KGs) like Wikidata and DBpedia allow users to visualize knowledge facts about real-world entities (nodes) and the interrelations between them (edges), stored in the form of RDF triples. They incorporate knowledge from structured repositories such as DBpedia, or by extracting knowledge from semi-structured web resources such as Wikipedia. The term knowledge graph first appeared in 2012, when Google introduced a so-called knowledge panel in its search results. It allows users to visualize consolidated knowledge from heterogeneous data resources such as personal websites and social media channels in a unified and categorized panel. Similarly, other knowledge graphs such as Wikidata and YAGO construct their KGs from external resources such as Wikipedia. EXPECTED WORK The first part of the work is to make a state of the art of the quality-of-data/knowledge graphs in order to become familiar with the field. This part has four main purposes: 1) study all the issues and challenges related to the quality of data in knowledge graphs as well as existing solutions while focusing on the big data collection challenge. 2) study all tools for data collection, processing, and matching. 3) Make a comparison between tools in terms of efficiency. 4) Propose new efficient and robust platforms dedicated to governments, businesses and industries. In the second part, we are interested in matching-based and machine learning methods with big data to handle data quality and missing data problems in knowledge graphs. It consists of studying, using and adopting machine learning, artificial intelligence, matching-based methods in order to provide high quality knowledge graphs for industries and researchers. In the third part, we are interested in Big data collected in KGs where we aim to propose new efficient and robust platforms dedicated to data storage and processing based on Hadoop ecosystem. This work also includes a practical part which consists in applying our proposed platforms in real case scenarios from governments, researchers and industries fields.

Doctorant.e: Baalbaki Hussein