fragment_combination¶
Module fragment_combination¶
This modules contains the functions for classifying fragment combinations.
- npfc.fragment_combination.classify(mol, aidxf1, aidxf2, cutoff=3, exclude_exocyclic=False)[source]¶
Classify a fragment combination found in a molecule as a dictionary with category, type and subtype values.
Following algorithm is applied for classifying fragment combinations:
Fragment 1: red; Fragment 2: green; Fused Atoms: yellow.
Possible classifications are:
- fusion
spiro (fs)
edge (fe)
bridged (fb)
linker (fl)
- false_positive
substructure (ffs)
overlap (ffo)
- connection
annulated (ca)
monopodal (cm)
- bipodal
spiro (cbs)
edge (cbe)
bridged (cbb)
linker (cbl)
- tripodal
spiro (cts)
edge (cte)
bridged (ctb)
linker (ctl)
- other
spiro (cos)
edge (coe)
bridged (cob)
linker (col)
- false_positive
cutoff (cfc)
- Parameters
mol (
Mol) – the input moleculeaidxf1 (
set) – the atom indices of the first fragment found in the moleculeaidxf2 (
set) – the atom indices of the second fragment found in the moleculecutoff (
int) – maximum number of intermediary atoms between 2 fragments to consider them as a combination (labelled as cfc otherwise)exclude_exocyclic (
bool) – exclude exocyclic atoms during classification
- Return type
- Returns
the dictionary specifying fragment combination category, type and subtype
- npfc.fragment_combination.classify_df(df_aidxf, cutoff=3, clear_cfc=True, exclude_exocyclic=False)[source]¶
Return a DataFrame with all fragment combination categories for a given set of molecules and fragment atom indices obtained by substructure search. For more details about category, type and subtype, see doc in method classify_fragment_combination.
The output DataFrame contains 8 columns decribing each fragment combination:
idm: the id of the molecule
idf1: the id of fragment 1
idf2: the id of fragment 2
fcc: a 3-letter code indicating category, type and subtype
category
type
subtype
aidxf1: the atom indices of fragment 1 found in the molecule
aidxf2: the atom indices of fragment 2 found in the molecule
Fragments with a number of intermediary atoms higher than defined cutoff are labelled as false positives.
- Parameters
df_aidxf (
DataFrame) – the input DataFrame with substructure matchescutoff (
int) – the maximum number of intermediary atoms between 2 fragmentsclear_cfc (
bool) – remove cfc combinations (false positives) from resultsexclude_exocyclic (
bool) – exclude exocylic atoms from fragment atom indices (during classification only)
- Return type
DataFrame- Returns
a DataFrame with all fragment combination classifications
- npfc.fragment_combination.get_fragment_combination_categories(include_fp=False)[source]¶
Return the list of all possible of Fragment Combinations Categories.
- npfc.fragment_combination.get_rings_between_two_fragments(mol, aidxf1, aidxf2)[source]¶
Returns the atom indices of every ring that connects two fragments together, defined by atom indices.
- npfc.fragment_combination.get_shortest_path_between_frags(mol, aidxf1, aidxf2)[source]¶
Return the shortest path within a molecule between two fragments defined by atom indices. First and last atom indices are part of respectively fragment 1 and fragment 2, so they should not be considered when estimating the distance between fragments.
(i.e. distance = len(shortest_path) - 2)
- Parameters
- Return type
- Returns
the atom indices of the shortest path between both fragments. The first index is the attachment point from fragment 1 whereas the last index is the attachment point from fragment 2