fragment_search¶
Module fragment_search¶
This modules contains the function to run substructure searches.
- npfc.fragment_search.get_fragment_hits(df_mols, df_frags, col_mol_mols='mol', col_mol_frags='mol', col_mol_inchikey='inchikey', fcp_labels=None, tautomer=False, col_to_index_mols='idm', col_to_index_frags='idm')[source]¶
Create a DataFrame recording every Fragment Hit in the input molecule DataFrame.
A Fragment Hit is composed of 6 fields:
idm: the id of the molecule (rowid from df_mols)
idf: the id of the fragment (rowid from df_frags)
aidxf: the atom indices of the fragment found in the molecule
mol_perc: the percentage of the molecule the fragment represents (based on hac)
mol: the molecule as RDKit Mol object
mol_frag: the fragment as RDKit Mol object
- Parameters
df_mols (
DataFrame) – the input DataFrame with the molecules (df_mols)df_frags (
DataFrame) – the input DataFrame with fragments to use for substructure search (df_frags)col_mol_mols (
str) – the column name in df_mols with the moleculescol_mol_frags (
str) – the column name in df_frags with the fragmentscol_mol_inchikey (
str) – the input DataFrame column name with the inchikey of the molecule. If it does not exist, then an empty column is created in the output.fcp_labels (
Optional[str]) – the column name in the fragments dataframe with the fcp labelstautomer (
bool) – if set to True, tautomers will be taken into account during fragment search (warning, tautomer-independant search is much slower!)col_to_index_mols (
str) – set the row indices of the DataFrame with the molecules to probe to the specified column. If empty (‘’), indices are left untouched.col_to_index_frags (
str) – set the row indices of the DataFrame with the fragments to seach for to the specified column. If empty (‘’), indices are left untouched.
- Return type
DataFrame- Returns
the substructure matches as a DataFrame
Warning
Row indices are used for recording the ids of substructure hits and are therefore required to be set to the molecule identifiers (i.e. idm).