fragment_combination_graph¶
Module fragment_combination_graph¶
This modules contains the functions for generating fragments combination graphs from individual molecules:
It provides functions to annotate pseudo-Natural Products as well.
- npfc.fragment_combination_graph.annotate_pnp(df_fcg, df_fcg_ref, data=['fcc', 'fcp_1', 'fcp_2'], consider_symmetry=True)[source]¶
Search and Identify for PNP molecules in the input DataFrame (df_fcg). PNP molecules are defined as molecules containing natural fragments combinations that are not found in a reference natural dataset. Fragment combinations are defined by extracting information from networkx graphs containing 3 informations:
source: id of fragment 1
target: id of fragment 2
attributes to consider: by default: fcc, fcp_1, fcp_2, etc.
Three new columns are appended to the input DataFrame:
_pnp_ref: the list of a references found for the target fcg
pnp_fcg: True if the input fcg has no match with any reference fcg, False otherwise
pnp_mol: True if the input molecule has no matching fcg with any reference fcg, False otherwise
When considering FCP, it is recommanded to consider symmetry. This results in ignoring the suffixes in FCPs, i.e. ‘1a’ and ‘1b’ become both ‘1’.
- Parameters
df_fcg – the input DataFrame
df_fcg_ref – the reference DataFrame
data – the list of edge attributes to consider during fcg comparison
consider_symmetry – consider fragment symmetry during annotating when using FCPs. To use FCPs data must include ‘fcp_1’ and/or ‘fcp_2’ (using or would not make sense here).
- Return type
DataFrame- Returns
the input DataFrame with 3 new pnp columns
- npfc.fragment_combination_graph.filter_edges_attributes(edges, cols)[source]¶
Networkx can either return one or all properties. Now that I am aware of this, I extract all attributes and then filter out the ones I do not want to use.
- npfc.fragment_combination_graph.filter_out_fcgs_ffs(df2, d)[source]¶
Part of the hotfix for redundant FCGs. Apply on DF with FCGs of a same molecules.
- npfc.fragment_combination_graph.filter_out_fcgs_ffs_all(df_fcg, df_fs)[source]¶
Part of the hotfix for redundant FCGs.
- npfc.fragment_combination_graph.generate(df_fcc, min_frags=2, max_frags=5, max_overlaps=5, split_unconnected=True, clear_ffs=True, palette=None)[source]¶
This method process a fragment combinations DataFrame and return a new DataFrame with a fragment combination graph for each molecule.
Each highlighted fragment of the molecule consists of a node of the graph. Fragment Combinations are used as edges for displaying fragment interactions and are annotated with the idm as well as the category of the combination.
- A str representation is also computed:
frag1[cmo]frag2-frag2[fed]frag3
- Two objects are computed and stored as b64 strings:
- colormap: a custom object with 3 informations regarding highlight colors (RGB values):
fragments: the color attributed to each fragment
atoms: the color attributed to each atom
bonds: the color attributed to each bond
graph: a nx object used comparing fragment connectivity among molecules.
Molecules can be filtered using thresholds.
- Parameters
df_fcc (
DataFrame) – a Dataframe with pairwise fragment combinationsmin_frags (
int) – a threshold for the minimum number of fragments allowed per fragment combination graphmax_frags (
int) – a threshold for the maximum number of fragments allowed per fragment combination graphmax_overlaps (
int) – a threshold for the maximum number of overlap combinations found in the molecule
- Return type
DataFrame- Returns
a DataFrame representing fragment combination graphs
- npfc.fragment_combination_graph.get_pnp_references(edges, df_ref, target_node=None)[source]¶
Return a tuple of reference idms. A reference is recorded if all edges of the target molecule are included at once within the reference row edges. In case no reference is found, an empty tuple is returned.
- Parameters
edges (
tuple) – the edges as a list of tuple of syntax (u, v, d) with d being the dict with the attributesdf_ref (
DataFrame) – the dataframe containing the edges to use for references.target_nodes – a frozenset of fragment ids found in the target edges to use for filering references to compare (optimization)
- Return type
- npfc.fragment_combination_graph.get_ref_aidxs(df_fs)[source]¶
Part of the hotfix for redundant FCGs. I did not record the occurrence id in the graphs, which was stupid. So now I need to use the df_fs to get the information instead. Needs to be used with fid col, which is defined in filter_out_fcgs_ffs_all.
- npfc.fragment_combination_graph.get_varying_d_aidxs(varying_fragments_occ, d)[source]¶
Part of the hotfix for redundant FCGs. Attribute the atom indices to the fragmnet occurrences.
- npfc.fragment_combination_graph.has_only_referenced_edges(edges, edges_ref)[source]¶
Check if at least one edge in edges is not present within edges_ref. Edges are tuple of syntax (u, v, d) with d being the dict with the attributes.
- npfc.fragment_combination_graph.regroup_edges_from_fcgs(df_fcg)[source]¶
This function regroups all edges of a molecule. It can be applied to more than only one molecule at once. This will result in one single fcg per molecule, with all combinations and fragment occurrences counted only once, which is useful for reporting. (I should never have split fragment graphs).