Projet de recherche doctoral numero :8325


Date depot: 12 avril 2022
Titre: Temporal Reasoning for Web Data Cleaning in a World Evolving
Directrice de thèse: Salima BENBERNOU (LIPADE)
Encadrant : Mourad OUZIRI (LIPADE)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Intelligence artificielle

Resumé: Problem statement and Context With the increasing availability of web data, we are witnessing the proliferation of businesses engaged in automatic data extraction from thousands of web sources with the goal of cleaning useful information and intelligence about people, companies, countries, products, and organizations [1]. However, the data cannot be used as-is because of errors with inconsistency and uncertainty that are in the sources themselves (which can be knowledge bases, databases etc) as well as those due to the automatic extraction. As an example, considering the DBpedia knowledge graph which is a (human and automatic) extraction from Wikipedia, there are 1761 disjointness inconsistencies between the only entities Place and Person in DBpedia according to the results of [8]. That is, the same resource is typed at the same time as Person and Place. Another example is represented as a knowledge graph and depicted in the following figure: at time t0 Bob is considered as an adult and at time t0+2 is considered as minor. The evolution of the resource/person Bob from adult to minor is not possible. This is a time-based inconsistency (red edge) that we aim to detect and repair. Some other inconsistencies may be hidden and couldn’t be detected using traditional query languages such as SPARQL. They need to be handled using more intelligent system based on combined symbolic AI (logics) and statistical AI (machine learning). In fact, time is crucial in information processing because events occur at specific points in time and also the relationships among objects exist over time. Many data sets contain temporal records which span a long period of time; each record has semantic attributes associated (explicitly or implicitly) with a timestamp and describes some aspects of a real world entity at a particular time (eg., author information in DBLP) [2][3]. The ability to model this temporal dimension is therefore necessary in real-world applications such as banking, medical records and geographical information systems, sentiment analysis in political domain etc. Traditional ontology languages generally reflect static information and do not support full access to temporal data and all reasoning tasks such as satisfiability problems, query answering etc. A major challenge in this task of dealing with temporal information comes from the combined need of modeling it and being able to handle the inefficiency of data, especially when this latter one is inconsistent, i.e, in contradiction with the domain of interest which could also be dynamic [4][5]. The aim of the thesis is to study the automation of temporal reasoning over temporal ontologies and knowledge bases to identify the error and the inconsistency in order to repair [6] those data for better predictions in decision making based on a clean world.