npfc
latest

Getting Started

  • Introduction
  • Installation
  • Package Architecture

Concepts and Examples

  • Preparation
  • Fragment Search
  • Fragment Combination Classification
  • Fragment Combination Graphs
  • Pseudo-NP Annotation

Workflows

  • Command Line Interface
  • Fragments
  • Natural Products
  • Synthetic Compounds

API

  • deduplicate
  • draw
  • filter
  • fragment_search
    • Module fragment_search
  • fragment_combination
  • fragment_combination_graph
  • fragment_combination_point
  • load
  • notebook
  • save
  • standardize
  • utils
npfc
  • »
  • fragment_search
  • Edit on GitHub

fragment_search¶

Module fragment_search¶

This modules contains the function to run substructure searches.

npfc.fragment_search.get_fragment_hits(df_mols, df_frags, col_mol_mols='mol', col_mol_frags='mol', col_mol_inchikey='inchikey', fcp_labels=None, tautomer=False, col_to_index_mols='idm', col_to_index_frags='idm')[source]¶

Create a DataFrame recording every Fragment Hit in the input molecule DataFrame.

A Fragment Hit is composed of 6 fields:

  1. idm: the id of the molecule (rowid from df_mols)

  2. idf: the id of the fragment (rowid from df_frags)

  3. aidxf: the atom indices of the fragment found in the molecule

  4. mol_perc: the percentage of the molecule the fragment represents (based on hac)

  5. mol: the molecule as RDKit Mol object

  6. mol_frag: the fragment as RDKit Mol object

Parameters
  • df_mols (DataFrame) – the input DataFrame with the molecules (df_mols)

  • df_frags (DataFrame) – the input DataFrame with fragments to use for substructure search (df_frags)

  • col_mol_mols (str) – the column name in df_mols with the molecules

  • col_mol_frags (str) – the column name in df_frags with the fragments

  • col_mol_inchikey (str) – the input DataFrame column name with the inchikey of the molecule. If it does not exist, then an empty column is created in the output.

  • fcp_labels (Optional[str]) – the column name in the fragments dataframe with the fcp labels

  • tautomer (bool) – if set to True, tautomers will be taken into account during fragment search (warning, tautomer-independant search is much slower!)

  • col_to_index_mols (str) – set the row indices of the DataFrame with the molecules to probe to the specified column. If empty (‘’), indices are left untouched.

  • col_to_index_frags (str) – set the row indices of the DataFrame with the fragments to seach for to the specified column. If empty (‘’), indices are left untouched.

Return type

DataFrame

Returns

the substructure matches as a DataFrame

Warning

Row indices are used for recording the ids of substructure hits and are therefore required to be set to the molecule identifiers (i.e. idm).

Next Previous

© Copyright 2019, Dr. Jose-Manuel Gally. Revision 80911fae.

Built with Sphinx using a theme provided by Read the Docs.