Projet de recherche doctoral numero :8181

Description

Date depot: 7 juillet 2021
Titre: Dependency Analysis and Evolution in Software Ecosystems
Directeur de thèse: Pascal POIZAT (LIP6)
Encadrante : Joyce EL HADDAD (LAMSADE)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Programmation et architecture logicielle

Resumé: In modern software, small changes can yield big consequences, alike what happened with the left-pad incident. The removal of this 10 lines function from the npm package manager registry caused thousands of projects that were depending on it, directly or indirectly, to break during development or at deployment. The entanglement between software pieces is a well-known reason for complexity in software. This is the case at the lower levels with dependencies between functions, methods, or classes. But higher levels, like software ecosystems, bring more complexity to the picture. The objective of the thesis is to propose new solutions, possibly based on machine learning, supporting software organisations in analysing their software ecosystems and performing evolution at the dependency level. In this direction, the following set of research questions will be addressed. RQ1. What are the quality attributes relevant at the dependency level for the evolution of software ecosystems ? RQ2. What are the quality metrics that can be associated to dependency graphs, and how do they relate to the quality attributes in RQ1? RQ3. It is possible to define smells / anti-patterns relative to RQ2? RQ4. Do the solutions to questions RQ1--RQ3 support the heterogeneous nature (e.g., multiple languages, presence of both source and DevOps files) of software ecosystems, and, if not, is it possible to extend them? RQ5. Do the solutions to questions RQ1--RQ3 support the dynamic nature of new software architectures (e.g., based on micro-services), and, if not, is it possible to extend them? RQ6. It it possible to apply the solutions to industrial-scale ecosystems? RQ7. What is the perception of practitioners on the solutions? RQ8. It is possible to take into account the human cost (e.g., in relation to developers' expertise) in evolution? RQ9. Can the existing machine learning solutions for code smell detection, technical debt analysis, and quality prediction, be lifted to the analysis of dependency evolution in software ecosystems? RQ10. How can one retrieve training data for the machine learning application? RQ11. Is it possible to automatize dependency evolution? This thesis will be achieved in the context of a collaboration between LIP6 at Sorbonne Université, LAMSADE at PSL Université Paris Dauphine, and SAP. It is associated to a CIFRE thesis proposal. The co-advisors will be P. Poizat (director, co-advisor) at LIP6 and J. El Haddad (co-advisor) at LAMSADE. The timeline we propose for the thesis is the following one. Year 1: state of the art and initial model. The first year of the thesis corresponds to RQ1, RQ2, RQ3, and a first iteration over RQ9 and RQ10. It will be first devoted to the literature survey and to the gathering and analysis of data (including the one relative to the SAP ecosystems of interest). One will then define a first graph model, quality functions (basic and aggregating ones) for this graph model, and address the retrieval of such graphs and quality information from gathered data. Finally, one will experiment how machine learning can be used to detect and predict dependency-related quality and smells over the graphs. Year 2: complex ecosystems. The second year of the thesis corresponds to RQ4 and RQ5, with corresponding iterations over RQ9 and RQ10. At this step, one will also study more closely the relation to the enterprise in terms of large-scale ecosystems (RQ6) and practitioners' perspective (RQ7). Year 3: automated evolution. The third, last year, of the thesis will be devoted to the automation of evolution (RQ11). This will take the form of refactoring / repair solutions, but also help in choosing (i.e., prioritising) evolution. As evolution involves humans in the loop, a model of human cost (RQ8) will be developed in order to be integrated in the choice algorithm. Application to industrial case studies (RQ6) and practitioners' perspective (RQ7) will be again studied. Year 3 (within the last 6 months): manuscript. The PhD thesis manuscript will be written during the last 6 months of the thesis.



Doctorant.e: Jaime Damien