Mining the Cambridge Structural Database for ... - ACS Publications


Mining the Cambridge Structural Database for...

0 downloads 84 Views 4MB Size

Article pubs.acs.org/crystal

Mining the Cambridge Structural Database for Matched Molecular Crystal Structures: A Systematic Exploration of Isostructurality Ilenia Giangreco,* Jason C. Cole, and Elizabeth Thomas Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, U.K. S Supporting Information *

ABSTRACT: The Cambridge Structural Database (CSD) is the world leading collection of small-molecule crystal structures and represents an invaluable resource for crystal engineers. It enables structures to be readily compared and new insights to be gained from the comparison. In order to search the database for pairs of structures that are related by the same chemical transformation, and to systematically investigate the effect of this transformation on crystal packing, a repository of matched molecular crystal structures has been derived from the CSD. This makes it easy to find all pairs of structures differing by the same chemical change or, alternatively, all available chemical modifications to a given CSD entry. Our analysis shows one of the many possible applications of these data. An extensive, yet not exhaustive, exploration of isostructurality across the entire CSD has been carried out with the aim of identifying packing features within crystals that maintain isostructurality. With particular focus on terminal chemical modifications observed between single-component structures with Z′ equal to 1, packing similarity has been calculated with an enhanced version of existing software. Across the entire data set of approximately 125 000 matched molecular pairs, 4% of the pairs were isostructural. Several cases showed an enrichment with respect to this baseline value, and examples have been discussed to illustrate some of the questions which can be asked and how they can be answered using the data set. This will open up avenues of research for the future and increase our understanding of the impact of functional groups on crystal packing.



INTRODUCTION Matched molecular pair (MMP) analysis is a technique largely used in drug discovery to predict how a particular structural modification would change a physical property of interest or the biological activity of a molecule.1,2 The Cambridge Structural Database (CSD) provides a unique and extensive source of three-dimensional (3D) structural information.3 The identification of all MMPs in the CSD would help understanding of the impact of specific structural changes on the crystal packing and morphology of materials. Here we describe the creation of a repository of matched molecular structures from the CSD that represents a very powerful platform and enables further research in the area of solid form informatics. A systematic exploration of packing features within crystals that maintain isostructurality would be of benefit for both crystal engineering and crystal structure prediction. In 2005, a specialized program called GRX (group exchange) was developed to perform searches in the CSD for pairs of entries that differ by a single functional group.4 The program could easily reproduce a study aimed at investigating how often the replacement of a chlorine atom with a methyl group maintained the same packing arrangements in the related crystal structures. However, the authors of the original publication identified a few limitations of the algorithm. For example, using GRX, it is not possible to apply filters (e.g., excluding organometallics or restrict the search to entries © XXXX American Chemical Society

with R factor lower than 10), and only pairs differing by a single structural modification can be identified. In 2010, a computationally efficient algorithm that is capable of systematically generating all MMPs in large chemical data sets has been developed.5 An MMP is a pair of molecules differing by a single localized structural change. Given a matched pair of molecules A and B, they can be interconverted into one another by the so-called molecular “transformation” of substructure A to substructure B (i.e., the substructures that have changed from molecule A to molecule B). By applying this methodology to the CSD, we have overcome the limitations highlighted above. An extensive analysis has been carried out with the aim of identifying common “transformations” that are most likely to preserve isostructurality. Understanding the phenomenon of isostructurality is a very challenging task as when it is observed it may arise from geometrical factors (shape and size) or from chemical factors (intermolecular interactions). Any given functional group may play a crucial role in determining the crystal structure of a small organic molecule. In the context of this work, we use the term isostructural to describe the situation where the packing between two structures Received: January 31, 2017 Revised: May 11, 2017

A

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 1. Example showing the fragments formed when enumerating all single, double, and triple cuts for the first entry in the CSD (i.e., AABHTZ).

This definition allows for changes in unit cell parameters and symmetry as well as anisotropic changes in cell parameters caused by disparity of sizes of chemical species. Our definition falls within, but is more specific than, the International Union of Crystallography (IUCr) description of isostructurality. It also differs from the definitions of isomorphous given by the IUCr and by Megaw,6 in that the requirement for identical symmetry is removed. Previous studies investigated existing structures in the CSD, and performed novel crystallizations, discussing degrees of

under comparison is essentially the same. We allow small molecular differences, but for two structures to be declared isostructural, the packing arrangement of both structures must place all molecules in equivalent lattice sites, and the same intermolecular arrangements must be observed. We assess this criterion by using the deviation of coordinates in one structure away from matched coordinates in another structure across large clusters of molecules within the respective structures when superimposed. B

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 2. Schematic workflow of the MMP identification algorithm as described by the original authors.5 (A) Steps followed when a single cut is applied. (B) Equivalent steps when double cuts are applied.

isostructurality.7−9 Here we automatically compared almost 125 000 pairs of structures without any prefiltering and then looked at the number of isostructural pairs for any given transformation.



a dimethylformamide solvate and a monohydrate of N′-(4-hydroxybenzylidene)benzohydrazide, respectively. MMP Identification. Matched molecular crystal structures were identified using the algorithm published in 20105 and made freely available through RDKit.12 The program reads a set of input molecules as SMILES and returns the MMPs as valid SMIRKS13 describing the identified molecular transformations. SMIRKS are largely used in chemoinformatics to describe a reaction mechanism with varying degrees of specificity and generality. The first step of the algorithm is the fragmentation of all molecules in the input set at every acyclic single bond between two non-hydrogen atoms. Acyclic single bonds are only cleaved if they occur between functional groups (and not within functional groups, for example, the C−OH bond in carboxylic acids). With default settings, the maximum size of the structural moiety that is changed is equal to 10 non-hydrogen atoms. Figure 1 shows the enumeration of all possible fragments for the first entry in the CSD when the molecule is cleaved once, twice, or three times. Obtained fragments are then canonicalized to ensure that the same change is always represented by the same SMIRKS and indexed. The canonicalization routine allows symmetrically equivalent transformations to be described by the same string. Figure 2 shows a schematic of the algorithm. For example, when a single “cut” is made on an input molecule, two fragments are formed (i.e., X and Y), and they are both canonicalized and added to the index. An index can be described as a pair of key and value. Therefore, fragment X will be added as a key into the index with fragment Y as its value, and fragment Y will be added as a key into the index with fragment X as its value. The identifier of the molecule will be also stored in the value of the index. Because of the canonicalization, fragments can be grouped by their key. By enumerating compounds with the same key and different values,

METHODS

CSD Python API. The CSD Python API was released in June 2015 as an integral component of the CSD System. It has been developed to make CSD data and CSD System functionality accessible in a programmatic fashion with the aim of facilitating integration with in-house workflows and third party applications. All CSD searches and packing similarity analyses described here were carried out with the aid of the CSD Python API. The scripts are available as Supporting Information. Data Set. A filtered subset of the CSD was used as input for the MMP identification algorithm. The structures used for comparison in this work were obtained from a predefined list of “best” representative polymorphs based on R factor.10 Structures containing crystallographic disorder; with R factor greater than or equal to 0.075; containing ions; for which no 3D information is available; and in which the data was obtained from a powder diffraction study were excluded. In addition, the search was restricted to organic structures containing the elements C, H, N, O, F, P, S, Se, Cl, Br, and I. This subset comprises 155 543 hits that were exported as SMILES strings.11 Multicomponent CSD entries were split into their components and treated as individual entities. A set of 137 778 unique components were identified and stored in a comma-separated file annotated with the list of associated CSD refcodes. For example, JAXQEF and WIPFEH will have the same unique identifier as they are C

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 3. Most frequently occurring terminal transformations in the CSD. The counts include only single component structures with Z′ equal to 1. using the set of n symmetry related molecules that had the closest van der Waals corrected interatomic contacts to the central molecule. Finally, the method was enhanced to allow comparison of structures that do not match exactly, but have structural features in common. This was achieved by reducing both structures being compared to the set of common functional groups between them as defined by their maximum common subgraph16 between the two molecules. This is, of course, a key improvement that allowed us to perform this study.

all valid MMPs are identified. The same algorithm is repeated when two or three cuts are performed. In the multiple cut scenario, the fragmentation produces a core and two fragments. In this case, the core will be added as a value to the index with the two terminal fragments as its key. In total, 107,377,434 MMPs were identified and stored in a relational database. For the analysis presented herein a filtered subset of the MMPs repository was used. To ensure that packing differences observed between MMPs are a result of the molecular transformation, and not due to the presence of solvate or water, analyses were restricted to CSD entries with single component structures and Z′ equal to 1. However, access to the entire data collection can still be valuable for different studies. The 15 most frequently occurring terminal transformations for the MMPs found in the CSD are shown in Figure 3. Packing Similarity Analysis. Analysis was undertaken using the Crystal Packing Similarity tool in the CSD Python API. This allows programmatic access to the same functionality available through the graphical user interface of the program Mercury.14 It is, therefore, possible to automate the comparison of about 125 000 pairs of crystal structures as required by this study. The methodology, first described in 2005, allows one to determine whether two crystal structures are the same to within specified tolerances.15 The relative position and orientation of molecules are captured using interatomic distances. The calculation compares the central molecule of the structures plus a number of neighboring molecules; the default size of this molecular cluster was 15 molecules (the central molecule plus 14 others). Two structures are considered isostructural if the algorithm returns 15 out of 15 molecules in common. Here we used an enhanced version of this program that includes three key improvements. Unlike the original version where only interatomic distances were considered between molecules when assessing whether a pair of molecules could be aligned successfully, the current implementation considers both distances and subsets of internal angles. Second, in the original version, the definition of possible molecular clusters was guided by generating a target cluster that contained the n (by default 15) symmetry related molecules with the closest centroids to the central molecule. This was found to favor layers for long flat molecules and so was not always representative of full isostructurality. Consequently, the target clusters used were generated



RESULTS AND DISCUSSION In the era of “Big Data”,17 matched molecular pair analysis is one of the major tools for extracting medicinal chemistry knowledge from large databases. The CSD is also a big data repository of small-molecule crystal structures and, thus, can be used to analyze the effect of chemical transformations on various crystallographic and physicochemical properties. Predictions about the effect of a transformation derived from statistical knowledge of past observations can help the design of new solids and, therefore, be relevant for crystal engineering and crystal structure prediction.18 In some cases chemically similar structures form crystallographically similar structures.19 There are many structures, however, which challenge this assumption, and here we undertake a systematic investigation using a large data set. Upon creation, the repository of matched molecular structures was mined to find all transformations obtained after a single cut of an acyclic single bond. How Often Are Isostructural Pairs Observed? The simplest analysis that we can perform is to define a baseline of the rate of occurrence of isostructurality. Across the entire data set we find that 4% of MMPs form pairs of structures that are isostructural within our definition. Isostructurality then is relatively infrequent, but we must consider that an MMP can represent a relatively large structural change. Analysis was continued by focusing on systems that were more common in the CSD. We found 12 703 transformations that were represented by at least five pairs of CSD structures. Multiple lists were generated and an extensive comparison of all pairs across all transformations was undertaken. Using packing similarity D

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

analysis, the structures were compared to count the number of cases where packing could be regarded as highly similar (i.e., sharing 15 out of 15 molecules in a common packing shell). Methyl Replacements and Isostructurality. A previous study of chloro to methyl interchange showed that approx-

imately 30% of chloro to methyl structural pairs were isostructural.20 Indeed, when considering the impact of methyl group transformations (Table 1), the chloro-methyl transformation shows the highest likelihood of retaining a high degree of structural similarity and has the highest average common cluster

Table 1. Packing Similarity in Matched Molecular Crystal Structures Where a Methyl Is Changed into a Different Functional Group and a Certain Degree of Isostructurality Was Observeda

E

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Table 1. continued

a

The number of pairs found for each transformation is reported. For each pair, the packing arrangement of the two structures was compared and their similarity was expressed by the number of molecules in common when superimposing the central molecule plus 14 symmetry related molecules. The third column shows the average cluster size of molecules in common when comparing all pairs of any given transformation. Similarly, the percent of isostructural pairs can be calculated. Our definition of isostructurality refers to 15 out of 15 molecules in common.

methyl group. A number of examples are shown in Figure 4. We found only one molecule in common when comparing the crystal packing of THYDIN04 and IDOXUR. The distinction is due to the iodine atom forming a halogen bond with the carbonyl group. The same observation applies to the packing comparison of JAPNOD with UNUQAW, which returned a match of two molecules in common. Again, a close contact between the iodine atom and a carbonyl oxygen is observed. Similarly, if focusing on transformations where a single heavy atom is changed, the hydroxy to methyl transformation has the lowest likelihood of retaining a high degree of structural similarity. This is in accordance with our expectation, that removing a group that has the potential to form a strong hydrogen bond is likely to disrupt the lattice radically. A pair of structures where such lattice disruption occurs is shown in Figure 5. The transformation alters the feasible

size between structures. The so-called chloro-methyl interchange was proposed by Kitaigorodski21 on the assumption that, under appropriate circumstances, interchanging single functional groups of comparable volume on a molecule, such as the chlorine atom (20 Å3) for a methyl group (24 Å3), might not result in significant changes in crystal packing. Halogen bonding has also been extensively studied in recent years.22 The strength of a halogen to act as bond donor is influenced by the polarizability of the atom and increases in the order Cl < Br < I, which is directly correlated with the volumes of the halogens (Cl: 20 Å3 < Br: 24.4 Å3 < I: 32.96 Å3). The same order is observed in Table 1, where the iodo to methyl transformation is unlikely to maintain isostructurality due to the ability of the iodine atom to form noncovalent interactions with Lewis bases, which can influence the observed packing in addition to its larger volume with respect to the F

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 4. Packing comparison between THYDIN04 (A) and IDOXUR (B), and JAPNOD (C) and UNUQAW (D). In both pairs the replacement of a methyl group with an iodine caused a disruption of the crystal packing. The halogen atom is, in fact, able to form an intermolecular interaction with the carbonyl group. Structures are shown in the same frame of reference, and intermolecular interactions are shown as cyan dashed lines.

Figure 5. Packing similarity between a pair of structures (i.e., YAHGUL and AZULUD) exemplifying a hydroxy to methyl transformation. (A) Chemical diagram of the two structures. (B) The cluster of three molecules in common in the two lattices is shown: AZULUD has carbon atoms in green whereas YAHGUL is in gray. (C) Pairs of unmatched molecules of AZALUD (red) with respect to the reference structure YAHGUL (gray carbon atoms) in subsequent shells. The cyan dashed lines show the intermolecular hydrogen bond between the hydroxyl and the ester carbonyl groups in AZULUD. G

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 6. Packing similarity between ARADED and GABSIB. The two CSD entries are isostructural and share the same intermolecular interactions. When the methyl group of GABSIB is changed into a carboxylate in ARADED, an additional intramolecular hydrogen bond is formed without any change in the packing arrangement.

Figure 7. (A) Classification of attachment points when a hydrogen atom is modified with a methyl group. (B) Cluster size distribution (i.e., number of molecules in common after packing similarity comparison) based on the type of attachment point as a percentage of each hydrogen bonding category (i.e., hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), hydrogen bond donor−acceptor (HBDA), or hydrophobic (HYD)).

element of the attachment atom, and the core type (e.g., chain, aliphatic, or aromatic ring). As shown in Figure 7, the vast majority of pairs (i.e., 82%) showed the transformed fragment (i.e., hydrogen or methyl group) attached to a hydrophobic atom, while in the remaining 18% of cases it was attached to an atom capable of forming hydrogen bonds. In the latter case, none of the pairs appeared to be isostructural according to our definition, and again, this observation confirms our assumption that adding a methyl group can prevent the formation of strong hydrogen bond interactions within the crystal lattice. In fact, if measuring the number of hydrogen bond donor groups before and after the methylation, we found that in all cases the transformation caused the loss of one of such a group. Figure 8 shows an example of this occurrence for a pair of biologically relevant structures (i.e., diazepam and nordiazepam) detected by this methodology. Nordiazepam has several functional groups that can interact in a specific and directed way with the surrounding environment. In particular, the amide group can form hydrogen bonds of various strengths as shown in Figure 8A where it is involved in a weak interaction between the carbonyl oxygen atom and an ortho-hydrogen atom of the phenyl ring forming infinite chains, and a strong intermolecular pair of hydrogen bonds forming dimers. Both these interactions are perturbed in the structure of diazepam (Figure 8B) where a methyl group is added on the nitrogen atom in the amide group. This case study was used by researchers at AstraZeneca

hydrogen bond patterns available to the respective structures. In AZULUD, the hydroxyl group forms a hydrogen bond with the ester carbonyl. This interaction is impossible in YAHGUL where no hydrogen-bond donor groups are present. The structures do share a degree of packing similarity; the methyl group forms weaker CH···O contact in place of the missing OH···O hydrogen bond. Only one of the 85 MMPs associated with the carboxy to methyl transformation showed 15 out of 15 molecules in common when subjected to crystal packing similarity calculation. As shown in Figure 6 the second carboxyl group added to ARADED did not form intermolecular interactions but rather an intramolecular hydrogen bond with the proximal carboxylate. Methyl to Hydrogen Transformation: A More Detailed Study. When a hydrogen atom is replaced by a methyl group, substantial changes of the packing arrangement are observed, such that only 5.68% of pairs out of 4492 can be considered isostructural according to our similarity criterion (as compared to an average of 4% across the entire data set). The importance of characterizing the attachment point in the common core for a better interpretation of results obtained from MMP analyses has been highlighted in the literature.23 Inspired by this work, we distinguished four different types of attachment point: (i) hydrogen bond donor (HBD), (ii) hydrogen bond acceptor (HBA), (iii) hydrogen bond donor−acceptor (HBDA), and (iv) hydrophobic (HYD). We also looked at the chemical H

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 8. Key interactions formed by the amide group in the structure of nordiazepam (A) and diazepam (B).

Figure 9. Chemical environment of the hydrophobic attachment point for the hydrogen to methyl pairs. (A) Distribution of structures with perturbed packing arrangement. (B) Distribution of isostructural pairs.

Figure 10. Packing comparison of KOLFUH and FOLFUN. The introduction of a methyl group in KOLFOH disrupt the stacking of the aliphatic ring.

as a proof of concept to show how solid-state perturbation can improve the intrinsic solubility of drug molecules with little change in logD.24

Solubility is a frequently recurring issue within pharmaceutical industry; it has been reported that approximately 40% of currently marketed drugs and up to 75% of compounds currently I

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Figure 11. Isostructurality of MMPs for the bromo to cloro transformation. The halogen atom can be attached to an aliphatic or an aromatic point and the transformation can be described at different level of detail depending on the number of atoms included when cutting the molecule. The big proportion of CSD structures when such a transformation occurs is for molecules with the halogen on the 3-position of a phenyl ring (i.e., 209 out of 506 total pairs).

under development are poorly water-soluble.25 However, it is important to deconvolute the relative importance of solvation and crystal packing as determining factors of low solubility compounds.26 The repository of MMPs derived from the CSD presented herein provides an opportunity to increase our

understanding of the crystal motifs that are most likely to form strong intermolecular interactions in the solid state, and how they can be modified. An example could be the presence of terminal methylsulfonyl groups, which tend to cause self-association of the molecule in the structural lattice (e.g., CSD entry EVEFES). J

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

Most Frequent Transformations in the CSD. There are many ways of analyzing this wealth of data. This current study lays the foundations for future in-depth investigations, including detailed analyses of pairs of structures contributing to each single transformation. A possible strategy would be to group the results table based on one side of the chemical transformation. This will allow us to list all observed modifications of each given functional group and answer questions like: ‘What are the most popular replacements of a phenyl ring in the CSD?’, ‘Among those, what is the likelihood of maintaining or perturbing the crystal packing?’, etc. Of course, the replacement of a hydrogen atom with a different functional group is the most frequent observation with 409 transformations of which 26 contained at least one isostructural pair. At the top of the sorted list we found 645 fluoro to hydrogen pairs. Interestingly, approximately 21% of them were isostructural. Fluorinated compounds are significantly important to the pharmaceutical industries.27 The introduction of fluorine into a molecule can influence conformation, pKa, intrinsic potency, membrane permeability, metabolic pathways, and pharmacokinetic properties. A systematic analysis of these pairs would also allow in-depth understanding of how it can influence structural properties. The second most popular transformation involves the phenyl ring, which can be either decorated with an additional

Table 2. Pair of CSD Structures for the Iodine to Ethynyl Transformation Refcode 1

Refcode 2

RMSD

cluster size of molecules in common

CECMOM TIHWIR NULJAG FANYAV AYOHEB AYOHIF KOHHAR IZAQIK ETYXUR

ZONYIK TIHXAK CACHOD FANYOJ CAJWIU FORGOI KOHHEV UROCIO IDOXUR

0.425 0.200 0.303 0.495 0.487 0.417 0.108 1.196 0.586

15 15 15 15 11 5 5 1 1

Isostructurality (i.e., 5.7% of cases) was sometimes observed when the methyl group was attached to a hydrophobic carbon, especially in an aliphatic chain (Figure 9B). Figure 9 shows how the 3231 pairs with a hydrophobic attachment point distribute among the three different chemical environments. The highest impact of methylation on crystal packing is on an aliphatic ring where it could interfere with stacking of these motifs or modify the ring conformation itself. An example is shown in Figure 10 where the stacking layers of KOLFUN are altered by the addition of a methyl group in KOLFOH.

Figure 12. (A) Superimposition of IZAQIK and UROCIO; the common scaffold of the two molecules has an RMSD of 1.195 Å. (B) Crystal packing comparison of IZAQIK (gray) and UROCIO (green). The conformational change induced by the ethynyl to iodo replacement allows different π stacking of the aromatic rings. The two structures are shown in the same orientation as superimposed by the crystal packing similarity tool. (C) Superimposition of ETYXUR and IDOXUR; the common scaffold of the two molecules has an RMSD of 0.585 Å. (D) Crystal packing arrangements of ETYXUR (gray) and IDOXUR (green); the latter shows the iodine atom interacting with the carbonyl group of the uracil moiety. The two structures are shown in the same orientation as superimposed by the crystal packing similarity tool. K

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

structures in the CSD, or alternatively, given a CSD refcode, one can retrieve a list of all its matched molecular structures. This platform replaces and improves an existing program called GRX as it is faster, more efficient, and allows types of searches that were previously inaccessible. One of the many applications of these data has been discussed in this study. We focused on single component structures with Z′ = 1 and looked at all terminal transformations for which at least five pairs were found in the CSD. A systematic calculation of crystal packing similarity was carried out for each of these pairs and percentage of isostructural pairs calculated for each transformation. We were able to reproduce and update the results published in a previous study reporting isostructurality due to chloro-methyl interchange. Further analysis has identified a number of examples which show how even small changes in the molecular structure can have a massive impact on crystal packing especially when the change involves hydrogen bond forming groups. These data may be used in further studies to examine in more depth how crystallographic and molecular descriptors relate with packing similarity and other properties relevant to drug development and crystal engineering.

substituent or replaced by a different group. We found 357 transformations. In 41 of these transformations there was at least one pair of structures that were isostructural (i.e., had a percentage of isostructural pairs greater than zero). The addition of a fluorine atom to a meta, para, and ortho position maintained isostructurality in 31.1% of 61, 25.1% of 231, and 22.2% of 81 cases, respectively. On the contrary, the addition of a chlorine atom to a phenyl ring showed isostructurality, at best, in 5.2% of 116 cases when it is in a meta position. Interestingly, the classic bioisosteric replacement of a phenyl with a thiophene ring occurred in 85 pairs (72 with the thiophene functionalized at the 3-position and 13 at the 2-position) with 19% percent being isostructural. Transformations with the Highest Degree of Isostructurality. An alternative approach to investigate the data is to sort the results table in descending order based on the percent of structures with a cluster of 15 out of 15 molecules in common. The highest degree of similarity is found for the substituent interchanges Cl → Br; I, Br → CF3 and Br → I. Figure 11 shows the degree of isostructurality for all bromo to chloro pairs. The green slice of the charts indicates the percentage of isostructural pairs, while the red and amber slices have been arbitrarily split in two categories based on the cluster size of molecules in common returned by the packing similarity calculation. It can be seen that the largest contribution to the total is given by the para-bromophenyl to parachlorophenyl (i.e., [*:1]c1ccc(Br)cc1 ≫ [*:1]c1ccc(Cl)cc1) transformation which occurred 209 times in the CSD. A significant degree of structural similarity (44.4%) is also observed when replacing an iodine with an ethynyl group. We found nine pairs of CSD entries where such a transformation is observed, and the average cluster size of molecules in common is nine. As shown in Table 2, only two pairs adopt a completely different packing in the solid state. In one case (i.e., IZAQIK, UROCIO) the transformation caused a change in the conformation of the molecule itself (Figure 12A-B), but the two structures are both characterized by π−π stacking of the aromatic rings. In the other case (i.e., ETYXUR, IDOXUR), the iodine interacts with a carbonyl group with an interaction that is not observed in the corresponding ethynyl structure (Figure 12C−D). Interestingly, looking at two of the isostructural pairs, we note that they were synthesized with the intention of studying structural systematics. TIHWIR and TIHXAK are part of a large study in which 133 4,4′-disubstituted benzenesulfonamidobenzenes were synthesized and their crystal structure were determined (including polymorphic forms). These structures were then used to make comparative study of the molecular packing and the nature of the intermolecular interactions.28 Also, in a study from 2004, the isostructurality of 4-(4′Iodo)phenoxyaniline (FANYOJ) with the corresponding bromo, chloro, and ethynyl (FANYAV) derivatives has been rationalized in terms of conditional isomorphism.29,30



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.cgd.7b00155. Scripts to perform CSD searches and crystal packing similarity calculations based on the CSD Python API (ZIP and PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Fax: +44 1223 336033. Tel: +44 1223 336408. ORCID

Ilenia Giangreco: 0000-0002-3345-0230 Jason C. Cole: 0000-0002-0291-6317 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank Anthony Reilly for careful reading of this manuscript. We also thank Neil Feeder and Colin Groom for helpful discussion and support.



REFERENCES

(1) Dossetter, A. G.; Griffen, E. J.; Leach, A. G. Matched Molecular Pair Analysis in Drug Discovery. Drug Discovery Today 2013, 18, 724− 731. (2) Leach, A. G.; Jones, H. D.; Cosgrove, D. A.; Kenny, P. W.; Ruston, L.; MacFaul, P.; Wood, J. M.; Colclough, N.; Law, B. Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure. J. Med. Chem. 2006, 49, 6672−6682. (3) Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge Structural Database. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2016, 72, 171−179. (4) van de Streek, J.; Motherwell, S. GRX: A Program to Search the CSD for Functional Group Exchanges. J. Appl. Crystallogr. 2005, 38, 694−696.



CONCLUSIONS In this work, we have created a repository of matched molecular crystal structures derived from the CSD. A pair of structures can differ by a terminal transformation if the change occurs in a terminal side chain, or core transformation if the change appears in a linker group between two ring systems. The repository can be queried in two alternative ways: given a chemical transformation, it is possible to find all matched L

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX

Crystal Growth & Design

Article

(5) Hussain, J.; Rea, C. Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets. J. Chem. Inf. Model. 2010, 50, 339−348. (6) Megaw, H. D. Crystal Structures: A Working Approach; W. B. Saunders: Philadelphia, 1973. (7) Kálmán, A.; Párkányi, L.; Argay, G. IUCr. Classification of the Isostructurality of Organic Molecules in the Crystalline State. Acta Crystallogr., Sect. B: Struct. Sci. 1993, 49, 1039−1049. (8) Fábián, L.; Kálmán, A.; IUCr, G.; D, I.; L, S. M. Volumetric Measure of Isostructurality. Acta Crystallogr., Sect. B: Struct. Sci. 1999, 55, 1099−1108. (9) Galek, P. T. A. Novel Comparison of Crystal Packing by Moments of Inertia. CrystEngComm 2011, 13, 841−849. (10) van de Streek, J. Searching the Cambridge Structural Database for the “Best” Representative of Each Unique Polymorph. Acta Crystallogr., Sect. B: Struct. Sci. 2006, 62, 567−579. (11) Weininger, D.; Weininger, A.; Weininger, J. L. SMILES. 2. Algorithm for Generation of Unique SMILES Notation. J. Chem. Inf. Model. 1989, 29, 97−101. (12) RDKit: Open-Source Cheminformatics. (13) Daylight Theory Manual, Chapter 5. Http://www.daylight. com/dayhtml/doc/theory/theory.smirks.html. (14) Macrae, C. F.; Bruno, I. J.; Chisholm, J. a.; Edgington, P. R.; McCabe, P.; Pidcock, E.; Rodriguez-Monge, L.; Taylor, R.; van de Streek, J.; Wood, P. a. Mercury CSD 2.0 − New Features for the Visualization and Investigation of Crystal Structures. J. Appl. Crystallogr. 2008, 41, 466−470. (15) Chisholm, J. A.; Motherwell, S. COMPACK: A Program for Identifying Crystal Structure Similarity Using Distances. J. Appl. Crystallogr. 2005, 38, 228−231. (16) Raymond, J. W.; Willett, P. Maximum Common Subgraph Isomorphism Algorithms for the Matching of Chemical Structures. J. Comput.-Aided Mol. Des. 2002, 16, 521−533. (17) Richter, L.; Ecker, G. F. Medicinal Chemistry in the Era of Big Data. Drug Discovery Today: Technol. 2015, 14, 37−41. (18) Cole, J. C.; Groom, C. R.; Read, M. G.; Giangreco, I.; McCabe, P.; Reilly, A. M.; Shields, G. P. Generation of Crystal Structures Using Known Crystal Structures as Analogues. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2016, 72, 530−541. (19) Resnati, G.; Boldyreva, E.; Bombicz, P.; Kawano, M. Supramolecular Interactions in the Solid State. IUCrJ 2015, 2, 675− 690. (20) Edwards, M. R.; Jones, W.; Motherwell, W. D. S.; Shields, G. P. Crystal Engineering and Chloro-Methyl Interchangea CSD Analysis. Mol. Cryst. Liq. Cryst. Sci. Technol., Sect. A 2001, 356, 337− 353. (21) Kitaigorodsky, A. I. Molecular Crystals and Molecules; Press, A., Ed.; New York, 1973. (22) Mukherjee, A.; Tothadi, S.; Desiraju, G. R. Halogen Bonds in Crystal Engineering: Like Hydrogen Bonds yet Different. Acc. Chem. Res. 2014, 47, 2514−2524. (23) Papadatos, G.; Alkarouri, M.; Gillet, V. J.; Willett, P.; Kadirkamanathan, V.; Luscombe, C. N.; Bravi, G.; Richmond, N. J.; Pickett, S. D.; Hussain, J.; Pritchard, J. M.; Cooper, A. W. J.; Macdonald, S. J. F. Lead Optimization Using Matched Molecular Pairs: Inclusion of Contextual Information for Enhanced Prediction of HERG Inhibition, Solubility, and Lipophilicity. J. Chem. Inf. Model. 2010, 50, 1872−1886. (24) Briggner, L.-E.; Hendrickx, R.; Kloo, L.; Rosdahl, J.; Svensson, P. H. Solid-State Perturbation for Solubility Improvement: A Proof of Concept. ChemMedChem 2011, 6, 60−62. (25) Williams, H. D.; Trevaskis, N. L.; Charman, S. A.; Shanker, R. M.; Charman, W. N.; Pouton, C. W.; Porter, C. J. H. Strategies to Address Low Drug Solubility in Discovery and Development. Pharmacol. Rev. 2013, 65, 315−499. (26) Docherty, R.; Pencheva, K.; Abramov, Y. A. Low Solubility in Drug Development: De-Convoluting the Relative Importance of Solvation and Crystal Packing. J. Pharm. Pharmacol. 2015, 67, 847− 856.

(27) Gillis, E. P.; Eastman, K. J.; Hill, M. D.; Donnelly, D. J.; Meanwell, N. A. Applications of Fluorine in Medicinal Chemistry. J. Med. Chem. 2015, 58, 8315−8359. (28) Gelbrich, T.; Hursthouse, M. B.; Threlfall, T. L. Structural Systematics of 4,4′-Disubstituted Benzenesulfonamidobenzenes. 1. Overview and Dimer-Based Isostructures. Acta Crystallogr., Sect. B: Struct. Sci. 2007, 63, 621−632. (29) Dey, A.; Desiraju, G. R. Supramolecular Equivalence of Ethynyl, Chloro, Bromo and Iodo Groups. A Comparison of the Crystal Structures of Some 4-Phenoxyanilines. CrystEngComm 2004, 6, 642. (30) Kitaigorodskii, A. I. Organic Chemical Crystallography; Consultants Bureau: New York, 1961.

M

DOI: 10.1021/acs.cgd.7b00155 Cryst. Growth Des. XXXX, XXX, XXX−XXX