fragment_combination_graph

Module fragment_combination_graph

This modules contains the functions for generating fragments combination graphs from individual molecules:

It provides functions to annotate pseudo-Natural Products as well.

npfc.fragment_combination_graph.annotate_pnp(df_fcg, df_fcg_ref, data=['fcc', 'fcp_1', 'fcp_2'], consider_symmetry=True)[source]

Search and Identify for PNP molecules in the input DataFrame (df_fcg). PNP molecules are defined as molecules containing natural fragments combinations that are not found in a reference natural dataset. Fragment combinations are defined by extracting information from networkx graphs containing 3 informations:

  • source: id of fragment 1

  • target: id of fragment 2

  • attributes to consider: by default: fcc, fcp_1, fcp_2, etc.

Three new columns are appended to the input DataFrame:

  • _pnp_ref: the list of a references found for the target fcg

  • pnp_fcg: True if the input fcg has no match with any reference fcg, False otherwise

  • pnp_mol: True if the input molecule has no matching fcg with any reference fcg, False otherwise

When considering FCP, it is recommanded to consider symmetry. This results in ignoring the suffixes in FCPs, i.e. ‘1a’ and ‘1b’ become both ‘1’.

Parameters
  • df_fcg – the input DataFrame

  • df_fcg_ref – the reference DataFrame

  • data – the list of edge attributes to consider during fcg comparison

  • consider_symmetry – consider fragment symmetry during annotating when using FCPs. To use FCPs data must include ‘fcp_1’ and/or ‘fcp_2’ (using or would not make sense here).

Return type

DataFrame

Returns

the input DataFrame with 3 new pnp columns

npfc.fragment_combination_graph.filter_edges_attributes(edges, cols)[source]

Networkx can either return one or all properties. Now that I am aware of this, I extract all attributes and then filter out the ones I do not want to use.

Parameters
  • edges (list) – the edges as a list of tuple of syntax (u, v, d) with d being the dict with the attributes

  • cols (list) – the list of attributes to use for PNP labelling.

Return type

list

npfc.fragment_combination_graph.filter_out_fcgs_ffs(df2, d)[source]

Part of the hotfix for redundant FCGs. Apply on DF with FCGs of a same molecules.

npfc.fragment_combination_graph.filter_out_fcgs_ffs_all(df_fcg, df_fs)[source]

Part of the hotfix for redundant FCGs.

npfc.fragment_combination_graph.generate(df_fcc, min_frags=2, max_frags=5, max_overlaps=5, split_unconnected=True, clear_ffs=True, palette=None)[source]

This method process a fragment combinations DataFrame and return a new DataFrame with a fragment combination graph for each molecule.

Each highlighted fragment of the molecule consists of a node of the graph. Fragment Combinations are used as edges for displaying fragment interactions and are annotated with the idm as well as the category of the combination.

A str representation is also computed:

frag1[cmo]frag2-frag2[fed]frag3

Two objects are computed and stored as b64 strings:
  • colormap: a custom object with 3 informations regarding highlight colors (RGB values):
    • fragments: the color attributed to each fragment

    • atoms: the color attributed to each atom

    • bonds: the color attributed to each bond

  • graph: a nx object used comparing fragment connectivity among molecules.

Molecules can be filtered using thresholds.

Parameters
  • df_fcc (DataFrame) – a Dataframe with pairwise fragment combinations

  • min_frags (int) – a threshold for the minimum number of fragments allowed per fragment combination graph

  • max_frags (int) – a threshold for the maximum number of fragments allowed per fragment combination graph

  • max_overlaps (int) – a threshold for the maximum number of overlap combinations found in the molecule

Return type

DataFrame

Returns

a DataFrame representing fragment combination graphs

npfc.fragment_combination_graph.get_pnp_references(edges, df_ref, target_node=None)[source]

Return a tuple of reference idms. A reference is recorded if all edges of the target molecule are included at once within the reference row edges. In case no reference is found, an empty tuple is returned.

Parameters
  • edges (tuple) – the edges as a list of tuple of syntax (u, v, d) with d being the dict with the attributes

  • df_ref (DataFrame) – the dataframe containing the edges to use for references.

  • target_nodes – a frozenset of fragment ids found in the target edges to use for filering references to compare (optimization)

Return type

tuple

npfc.fragment_combination_graph.get_ref_aidxs(df_fs)[source]

Part of the hotfix for redundant FCGs. I did not record the occurrence id in the graphs, which was stupid. So now I need to use the df_fs to get the information instead. Needs to be used with fid col, which is defined in filter_out_fcgs_ffs_all.

npfc.fragment_combination_graph.get_varying_d_aidxs(varying_fragments_occ, d)[source]

Part of the hotfix for redundant FCGs. Attribute the atom indices to the fragmnet occurrences.

npfc.fragment_combination_graph.has_only_referenced_edges(edges, edges_ref)[source]

Check if at least one edge in edges is not present within edges_ref. Edges are tuple of syntax (u, v, d) with d being the dict with the attributes.

Parameters
  • edges (tuple) – the edges of the target molecule fcg

  • edges_ref (tuple) – the edges of reference molecule fcg

Return type

bool

Returns

False if at least 1 edge is not found in the reference, True otherwise

npfc.fragment_combination_graph.regroup_edges_from_fcgs(df_fcg)[source]

This function regroups all edges of a molecule. It can be applied to more than only one molecule at once. This will result in one single fcg per molecule, with all combinations and fragment occurrences counted only once, which is useful for reporting. (I should never have split fragment graphs).