fragment_combination

Module fragment_combination

This modules contains the functions for classifying fragment combinations.

npfc.fragment_combination.classify(mol, aidxf1, aidxf2, cutoff=3, exclude_exocyclic=False)[source]

Classify a fragment combination found in a molecule as a dictionary with category, type and subtype values.

Following algorithm is applied for classifying fragment combinations:

_images/fragment_tree.png

Fragment 1: red; Fragment 2: green; Fused Atoms: yellow.

Possible classifications are:

  • fusion
    • spiro (fs)

    • edge (fe)

    • bridged (fb)

    • linker (fl)

    • false_positive
      • substructure (ffs)

      • overlap (ffo)

  • connection
    • annulated (ca)

    • monopodal (cm)

    • bipodal
      • spiro (cbs)

      • edge (cbe)

      • bridged (cbb)

      • linker (cbl)

    • tripodal
      • spiro (cts)

      • edge (cte)

      • bridged (ctb)

      • linker (ctl)

    • other
      • spiro (cos)

      • edge (coe)

      • bridged (cob)

      • linker (col)

    • false_positive
      • cutoff (cfc)

Parameters
  • mol (Mol) – the input molecule

  • aidxf1 (set) – the atom indices of the first fragment found in the molecule

  • aidxf2 (set) – the atom indices of the second fragment found in the molecule

  • cutoff (int) – maximum number of intermediary atoms between 2 fragments to consider them as a combination (labelled as cfc otherwise)

  • exclude_exocyclic (bool) – exclude exocyclic atoms during classification

Return type

dict

Returns

the dictionary specifying fragment combination category, type and subtype

npfc.fragment_combination.classify_df(df_aidxf, cutoff=3, clear_cfc=True, exclude_exocyclic=False)[source]

Return a DataFrame with all fragment combination categories for a given set of molecules and fragment atom indices obtained by substructure search. For more details about category, type and subtype, see doc in method classify_fragment_combination.

The output DataFrame contains 8 columns decribing each fragment combination:

  1. idm: the id of the molecule

  2. idf1: the id of fragment 1

  3. idf2: the id of fragment 2

  4. fcc: a 3-letter code indicating category, type and subtype

  5. category

  6. type

  7. subtype

  8. aidxf1: the atom indices of fragment 1 found in the molecule

  9. aidxf2: the atom indices of fragment 2 found in the molecule

Fragments with a number of intermediary atoms higher than defined cutoff are labelled as false positives.

Parameters
  • df_aidxf (DataFrame) – the input DataFrame with substructure matches

  • cutoff (int) – the maximum number of intermediary atoms between 2 fragments

  • clear_cfc (bool) – remove cfc combinations (false positives) from results

  • exclude_exocyclic (bool) – exclude exocylic atoms from fragment atom indices (during classification only)

Return type

DataFrame

Returns

a DataFrame with all fragment combination classifications

npfc.fragment_combination.get_fragment_combination_categories(include_fp=False)[source]

Return the list of all possible of Fragment Combinations Categories.

Parameters

include_fp (bool) – include false positives

Return type

list

Returns

the list of all possible fragment combination categories

npfc.fragment_combination.get_rings_between_two_fragments(mol, aidxf1, aidxf2)[source]

Returns the atom indices of every ring that connects two fragments together, defined by atom indices.

Parameters
  • mol (Mol) – the input molecule

  • aidxf1 (set) – the atom indices of the first fragment found in the molecule

  • aidxf2 (set) – the atom indices of the second fragment found in the molecule

Return type

list

Returns

a list of intermediary rings between both fragments and defined by atom indices

npfc.fragment_combination.get_shortest_path_between_frags(mol, aidxf1, aidxf2)[source]

Return the shortest path within a molecule between two fragments defined by atom indices. First and last atom indices are part of respectively fragment 1 and fragment 2, so they should not be considered when estimating the distance between fragments.

(i.e. distance = len(shortest_path) - 2)

Parameters
  • mol (Mol) – The input molecule.

  • aidxf1 (set) – the atom indices of the first fragment found in the molecule

  • aidxf2 (set) – the atom indices of the second fragment found in the molecule

Return type

tuple

Returns

the atom indices of the shortest path between both fragments. The first index is the attachment point from fragment 1 whereas the last index is the attachment point from fragment 2