Abstract:
This dissertation comprises a series of studies that collectively elucidate the nature and modeling of halogen-mediated, noncovalent interactions through a combination of quantum mechanical (QM) and data-driven approaches. In the first part, a QM investigation of halogen···water interactions is conducted, with emphasis on their energetic and structural characteristics in biologically relevant environments such as protein binding sites. The second, and more extensive part of the thesis focuses on halogen···π interactions, integrating high-level QM calculations with machine learning (ML) approaches.
The first study investigated halogen···water interactions. Starting from a distinct iodine···water contact in a solved protein crystal structure, a comprehensive QM and database analysis was conducted and revealed that these interactions, though moderate in strength, are structurally well-defined and follow systematic trends across the halogen series. Chlorine was found to form flexible, mixed halogen-hydrogen-bonding arrangements, while iodine engaged in highly directional π-hole interactions with water oxygen lone pairs.
The second study provided a quantitative benchmarking of halogen···π interactions and established MP2/TZVPP as the most balanced QM method for their description. This level of theory achieved near-reference accuracy with a
root-mean-square deviation of approximately 1 kJ/mol relative to CCSD(T)/CBS data, ensuring consistency for subsequent modeling studies.
The third study introduced neural network models trained on high-level QM data to predict halogen···π interaction energies. The models reproduced MP2-level energies with excellent agreement (R² ≈ 0.998) and achieved an approximate eight-order-of-magnitude (10^8) reduction in computational cost. Validation against both random and protein-derived geometries confirmed robust generalization within the π-hole interaction domain.
The fourth study extended this QM-AI framework to include halogen···π interactions with phenol, imidazole, and indole systems, representing the aromatic side chains of tyrosine, histidine, and tryptophan. The extended models maintained near-MP2 accuracy across all systems (R² ≈ 0.99) and demonstrated successful transferability to protein-derived geometries, confirming their scalability to chemically diverse environments. Furthermore, the model’s scalability and adaptability to new chemical environments was demonstrated by incorporating additional data and retraining, which led to improved performance.
Together, these findings establish a coherent and transferable framework for accurate and efficient modeling of halogen-mediated noncovalent interactions across varied molecular contexts.