Interpretation, Verification and Privacy Techniques for Improving the Trustworthiness of Neural Networks

  • Arnaud Dethise

Student thesis: Doctoral Thesis


Neural Networks are powerful tools used in Machine Learning to solve complex problems across many domains, including biological classification, self-driving cars, and automated management of distributed systems. However, practitioners' trust in Neural Network models is limited by their inability to answer important questions about their behavior, such as whether they will perform correctly or if they can be entrusted with private data. One major issue with Neural Networks is their "black-box" nature, which makes it challenging to inspect the trained parameters or to understand the learned function. To address this issue, this thesis proposes several new ways to increase the trustworthiness of Neural Network models. The first approach focuses specifically on Piecewise Linear Neural Networks, a popular flavor of Neural Networks used to tackle many practical problems. The thesis explores several different techniques to extract the weights of trained networks efficiently and use them to verify and understand the behavior of the models. The second approach shows how strengthening the training algorithms can provide guarantees that are theoretically proven to hold even for the black-box model. The first part of the thesis identifies errors that can exist in trained Neural Networks, highlighting the importance of domain knowledge and the pitfalls to avoid with trained models. The second part aims to verify the outputs and decisions of the model by adapting the technique of Mixed Integer Linear Programming to efficiently explore the possible states of the Neural Network and verify properties of its outputs. The third part extends the Linear Programming technique to explain the behavior of a Piecewise Linear Neural Network by breaking it down into its linear components, generating model explanations that are both continuous on the input features and without approximations. Finally, the thesis addresses privacy concerns by using Trusted Execution and Differential Privacy during the training process. The techniques proposed in this thesis provide strong, theoretically provable guarantees about Neural Networks, despite their black-box nature, and enable practitioners to verify, extend, and protect the privacy of expert domain knowledge. By improving the trustworthiness of models, these techniques make Neural Networks more likely to be deployed in real-world applications.
Date of AwardMar 22 2023
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Sciences and Engineering
SupervisorMarco Canini (Supervisor)


  • machine learning
  • neural network
  • verification
  • explainable ai
  • machine learning privacy
  • trustworthy machine learning
  • formal methods

Cite this