bio_analytics.pl -- Computational biology data analytics.

Collects a number of biological data analytics tasks. This library provides tools for the bio_db served data, empowering downstream analyses of user's experimental data.

Installation and loading:

?- pack_install(bio_analytics).
        % also installs pack(lib). other dependencies are installed at load time via lib
?- use_module(library(lib)).
?- lib(bio_analytics).

The library comes with one experimental dataset: data/silac/bt.csv which is used in the example files in directory examples/ .

author
- nicos angelopoulos
version
- 0.1 2019/4/22
- 0.2 2019/5/08
- 0.3 2019/5/11
- 0.4 2020/9/18
- 0.5 2023/1/2
- 0.6 2023/6/6
license
- MIT
 bio_analytics_version(-Vers, -Date)
Version and release date.
?- bio_analytics_version(V,D).
V = 0:6:0,
D = date(2023,6,6).
 bio_diffex(+Exp, -DEs, -NonDEs, +Opts)
Select a sublist of significantly, differentially expressed genes in an experiment.

Exp is in the form of a mtx/1 matrix. Selected (DEs) and rejected (NonDEs) are returned in common format, of either (default) Gene-Ev, where Ev is the expression value for Gene in Exp, or sub-matrices of Exp. The format is controlled by option as_pairs.

We use ev as short for expression value, to avoid confusion with exp which short for experiment.

Opts

as_non(AsNon=pvalue)
what to return as non differentially expressed: either everything in Gcnm (AsNon=all, default when EvLog is true), or only those with numeric pvalue (AsNon=pvalue, default when EvLog is false)
as_pairs(AsPairs=true)
whether to return pairs or matrices
de_max(DexMax=false)
puts a cap on the number of differentially expressed genes returned
de_mtx(DexMtx)
returns the selected matrix. if an atom, its taken to be a filename
exp_pv_cnm(ExpPcnm=adj.pvalue)
the experimental column (found in MsF) on which Pcut is applied as a filter
exp_pv_cut(Pcut=0.05)
p.value cut off for experimental input from MSstats
exp_ev_log(EvLog=true)
are the expression values log values
exp_ev_cnm(EvCnm)
expession value column name (log2FC if EvLog=true, expression otherwise)
exp_ev_cut_let(EvLet = -1)
expression values below (less or equal) which values are selected
exp_ev_cut_get(EvGet)
expression values above (greater or equal) which values are selected (defaults is 2 if EvLog is false, and 1 otherwise)
exp_ev_include_inf(IncInf=false)
include infinity values as diffexs ? (default is false if EvLog is false, and true otherwise)
gene_id_cnm(Gcnm=Symbols)
column name for the key value in the pair lists: DEs and NonDEs.
mtx_dx_in(Mtx)
can be used to return the Mtx used (from mtx(Exp,Mtx) call)
which(Wch=dx(UpIs,DownIs))
returns the up and down-regulated indices
?- lib(mtx),
   absolute_file_name(pack('bio_analytics/data/silac/bt.csv'), CsvF),
   mtx(CsvF, Mtx, convert(true)),
   assert(mtx_data(Mtx)).

CsvF = '/usr/local/users/nicos/local/git/lib/swipl/pack/bio_analytics/data/silac/bt.csv',
Mtx = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('Q9P126', ...), row(..., ..., ..., ...)|...].

?- lib(debug_call),
   debug(testo),
   mtx_data(Mtx),
   debug_call(testo, dims, mtx/Mtx),
   bio_diffex(Mtx, DEPrs, NonDEPrs, []),
   debug_call(testo, length, de/DEPrs),
   debug_call(testo, length, nde/NonDEPrs).

% Dimensions for matrix,  (mtx) nR: 1245, nC: 4.
% Length for list, de: 426.
% Length for list, nde: 818.

DEPrs = ['CLEC1B'- -4.80614893171469, 'LGALS1'- -2.065877096524, ... - ...|...],
NonDEPrs = ['CNN2'- -0.69348026828097, 'CXCR4'-0.73039667221395, ... - ...|...].

?-
    mtx_data(Mtx),
    bio_diffex(Mtx, DEPrs, NonDEPrs, exp_pv_cut(0.01)),
    debug_call(testo, length, de/DEPrs),
    debug_call(testo, length, nde/NonDEPrs).

% Length for list, de: 286.
% Length for list, nde: 958.

?-
    mtx_data(Mtx),
    Opts = [exp_ev_cut_let(inf),exp_ev_cut_get(-inf)],
    bio_diffex(Mtx, DEPrs, NonDEPrs, Opts),
    debug_call(testo, length, de/DEPrs),
    debug_call(testo, length, nde/NonDEPrs).

% Length for list, de: 581.
% Length for list, nde: 663.

?- mtx_data(Mtx),
    Opts = [exp_ev_cut_let(inf),exp_ev_cut_get(-inf),as_pairs(false)],
    bio_diffex(Mtx, DEs, NonDEs, Opts),
    debug_call(testo, length, de/DEPrs),
    debug_call(testo, length, nde/NonDEPrs).

% Length for list, de: 582.
% Length for list, nde: 664.

Mtx = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row(..., ..., ..., ...)|...],
Opts = [exp_ev_cut_let(inf), exp_ev_cut_get(-inf), as_pairs(false)],
DEs = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('Q9P126', 'CLEC1B', -4.80614893171469, 2.057e-37), row(..., ..., ..., ...)|...],
NonDEs = [row('Protein IDs', 'Symbols', log2FC, adj.pvalue), row('B4DUT8;Q6FHC3;Q6FHE4;Q99439;B4DDF4;Q53GK7;B4DN57;B4DHU5;K7ES69;H3BVI6;H3BQH0', 'CNN2', -0.69348026828097, 0.0814675), row(..., ..., ..., ...)|...].
author
- nicos angelopoulos
version
- 0.1 2019/5/2
- 0.2 2020/9/3, ability to return sub-matrices (9/14): diffex_mtx()
- 0.3 2022/12/16, changed name from exp_diffex/4 to bio_diffex/4.
 bio_symbols(+Vect, -Symbs, +Opts)
Convert gene identifiers pointed by Vect to Symbols.

Vect can be a list or pairlist (K-V) of which the K element is taken to be the identifier. Vect can also be an atomic representing the Nth column on column for Mtx.

Opts

debug(Dbg=false)
progress, informational messages
mtx(Mtx)
to be used if Vect is atomic
org(Org=hs)
organism the data come from, via bio_db_organism/2
org_exp_id(ExpId=symb)
the type of the experimental gene ids for the organism. default depends on Org, but currently all map to symb
?- hgnc_homs_hgnc_ncbi( 19295, Ncbi ), bio_symbols( [Ncbi], Symbs, org_exp_id(Ncbi) ).
Ncbi = 114783,
Symbs = ['LMTK3'].
author
- nicos angelopoulos
version
- 0:1 2022/12/21
See also
- bio_db_organism/2
 bio_volcano_plot(+Mtx, +Opts)
Create a volcano plot of differential expressed values (x) against p-values (y).

Uses ggplot2 and bio_analytics:bio_diffex/4. The predicate creates an R data.frame and creates a ggplot2 object from which typically output files are produced.

Opts

clr_down(ClrDw=brandeisblue)
bio_colour_hex/2 compatible colour for down regulated entries
clr_inv(ClrInv=cadetgrey)
colour for invariant entries
clr_line(ClrLn=bazaar)
colour for cut-off lines
clr_up(ClrUp=cadmiumred)
colour for up regulated entries
dir(Dir)
as understood by os_dir_stem_ext/2- see Odir, below too
debug(Dbg=true)
informational, progress messages
ext(Ext=pdf)
extension that defines the type of output (passed to os_dir_stem_ext/2)
legend_cnm(LgCnm=regulation)
the legend frame column name (also appears as legend title, if none is given)
legend_down(LegUp=down)
token for up-regulation
legend_inv(LegUp=invariant)
token for invariants
legend_title(LgTitle)
by default LgCnm is used.
legend_up(LegUp=up)
token for up-regulation
lim_x_max(LimXmax)
if LimXmin is given this defaults to -LimXmin. By default ggplot2 decides this.
lim_x_min(LimXmin)
if LimXmax is given this defaults to -LimXmax. By default ggplot2 decides this.
lim_y_max(LimXmax)
By default ggplot2 decides this.
lim_y_min(LimY=0)
this is not used unless LimYmax is given
odir(Odir)
as understood by os_dir_stem_ext/2- preferred to Dir above
plot_file(File)
returns the output file
rvar_cv(Rvar=bvp_cv)
R var for colors vector
rvar_df(Rvar=bvp_df)
R variable for the 3 column data frame
rvar_gt(Rvar=bvp_gt)
R variable holding the plot term
rvar_rmv(RRmv=true)
remove the three R variables from above
stem(Stem=volc_plot)
stem for output, set to false if no output is wanted
theme(Theme=classic)
theme_Theme should be a ggplot() theme

Opts are passed to bio_diffex/4. This predicate also pinches the defaults from there.

Examples, fixme: use iris dataset.

?- bio_volcano_plot([]).
author
- nicos angelopoulos
version
- 0.1 2022/12/15
See also
- bio_diffex/4
- os_dir_stem_ext/2
To be done
- add args for influencing of the gg term and the output-to-file call
 exp_gene_family_string_graph(+Exp, +Family, -Graph, +Opts)
 exp_gene_family_string_graph(+Exp, +Family, ?DEPrs, ?NonDEPrs, -Graph, +Opts)
Generate and possibly plot the STRING graph of a known gene family as affected in a biological Exp_eriment. Each family gene is placed of the following states according to wether it was identified in Exp and how it was modified (in relation to background conditions):

Note that de-reguation trumps identification, that is, currently there is no distiction between genes that seen to be both significantly de-regulated and identified versus just those that are simply significantly de-regulated.

In the exp_gene_family_string_graph/6 version DEPrs and NonDEPrs can be provided, which optimises use in loops over families (see exp_go_over_string_graphs/4).

Opts

exp_ev_log(EvLog=true)
are the expression values (ev) log values ? (also passed to bio_diffex/4)
faint_factor(Fctr=1.2)
faint factor for non-significant nodes
include_non_present(IncP=true)
false excludes non-indentified family genes
include_non_significant(IncS=true)
false excluded non-significant family genes
node_colours(Clrs=clrs(red,green,yellow,khaki4,grey))
colours for the different types of nodes
org(Org=hs)
organism in which Family is looked in (and experiment was performed)
org_exp_id(ExpID)
the type of the experimental gene ids for the organism. default depends on Org, but currently all map to symb
plot(Plot=true)
whether to plot the graph via wgraph_plot/2
wgraph_plot_opts(WgOpts=[])
options for wgraph_plot/2 (in preference to any defaults from Self)
wgraph_plot_defs(WgDefs=[])
options for wgraph_plot/2 (added to the end, after any defaults from Self)

These Options are passed to a number of other pack predicates.

See [pack('bio_analytics/examples/bt.pl')].

?- lib(real), lib(mtx),
   absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
   Ppts = [ vjust= -1, node_size(3), mode="fruchtermanreingold", format(svg), stem(bt)],
   Opts = [ exp_ev_cut_let(inf), exp_ev_cut_get(-inf),
            include_non_present(false), include_non_significant(false),
            minw(200), wgraph_plot_opts(Ppts) ],
   exp_gene_family_string_graph( CsvF, autophagy, G, Opts ).

Produces file: bt.svg

[[../doc/images/bt.svg]]

author
- nicos angelopoulos
version
- 0.1 2019/4/15
See also
- bio_diffex/4
- gene_family/3
- wgraph_plot/2
- bio_symbols/3
 exp_go_over(+CsvF, -GoOver, +Opts)
Perform gene ontology over-representation analysis.

For experimental data in CsvF select de-regulated genes and on
those perform over representation analysis in gene ontology.
Results in GoOver are either as a values list or a csv file, the name of which is assumed to be the ground value of GoOVer.

Opts

go(GoSec=BP)
gene ontology section, in: [BP,MF,CC]
go_frame(GoFrame=lib_org)
which go frame of GoTerms, Evidence and Gid to use. Default, lib_org, uses r_lib(org.<Org>.eg.db) frames, alternative bio_db uses bio_db tables, before the intro of the option this was the way, so use this for backward compatibility
go_over_pv_cut(PvCut=0.05)
p value filter for the results
org(Org=hs)
one of bio_db_organism/2 first argument values (hs, gallus and mouse for now)
org_exp_id(OrgExpId)
the type of the experimental gene ids for the organism. The default depends on Org, but currently all map to symb.
stem(Stem=false)
stem for output csv file. When false, use basename of CsvF.
to_file(ToF=false)
when GoOver is unbound, this controls whether the output goes to a file or a values list
universe(Univ=go_exp)
Univ in : [genome,go,go_exp,experiment]
genome(Gen)
all gene identifiers in relevant map predicates
go(Go)
all gene identifies appearing in any ontology
go_exp(GoExp)
gene ontology genes that appear in experiment
experiment(Exp)
all identifiers in the experiment
ontology(Onto)
not implemented yet. like OG above, but only include this Ontology branch

Options are also passed to bio_diffex/4.

?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
   exp_go_over( CsvF, GoOver, [] ).

CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
GoOver = [row('GOBPID', 'Pvalue', adj.pvalue, ...), row('GO:0061387', 0.000187, ...)|...].

?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
   exp_go_over( CsvF, GoOver, [stem(here),to_file(true)] ).

CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
GoOver = here_gontBP_p0.05_univExp.csv.

?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
   exp_go_over( CsvF, 'a_file.csv', [stem(here),to_file(true)] ).

CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',

?- lib(by_unix).
?- @wc(-l,'a_file.csv').
183 a_file.csv

?- @wc(-l,'here_gontBP_p0.05_univExp.csv').
183 here_gontBP_p0.05_univExp.csv
true.

?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
   exp_go_over( CsvF, OverF, [to_file(true)] ).

CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
OverF = go_over_gontBP_p0.05_univExp.csv.

?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), CsvF ),
   exp_go_over( CsvF, OverF, [to_file(true),stem(false)] ).

CsvF = '.../swipl/pack/bio_analytics/data/silac/bt.csv',
OverF = '.../swipl/pack/bio_analytics/data/silac/bt_gontBP_p0.05_univExp.csv'.
author
- nicos angelopoulos
version
- 0.1 2019/5/2
- 0.2 2022/12/20, Univ=go and Org=gallus
- 0.3 2023/6/5, option go_frame(GoFrame)
See also
- go_over_universe/6
 exp_go_over_string_graphs(+Exp, ?GoOver, ?Dir, -Opts)
Create string graphs for all the over-represented terms in GoOver.

When GoOver is a variable expr_go_over/3 is called to generate it.

Opts

dir_postfix(Psfx=go_strings)
postfix for outputs directory (when Dir is a variable)
go_id_clm(GoIdClm=1)
column id for over represented GO terms
ov_max(MaxOvs=false)
when a number, it is taken as the maximal integer of terms to plot graphs for
stem_type(Sty=go_pair_ord)
similar to go_string_graph/3, but different default, others: go_name, go_id. Here the length of GO terms (Len) is added to form go_pair_ord(I,Len) for forwarding
viz_de_opts(VizOpts=[])
options for restricting genes to visualise via exp_diff/4. Default does not restrict what genes are visualised. diffex_only = VizOpts restricts genes to significants only.
wgraph_plot_opts(WgOpts=WgOpts)
defaults to [vjust = -1, node_size(3), format(svg)].

Options are passed to exp_gene_family_string_graph/4.

?- debug(real).
?- debug(exp_go_over_string_graphs).
?- absolute_file_name( pack('bio_analytics/data/silac/bt.csv'), Exp ),
   exp_go_over_string_graphs( Exp, Gov, Dir, [] ).

% Sending to R: pltv <- ggnet2(lp_adj,vjexp_go_over_mtxust = -1,size = 3,label = pl_v_1,color = pl_v_2,edge.size = pl_v_3,edge.color = "#BEAED4")
author
- nicos angelopoulos
version
- 0.1 2019/5/5
- 0.2 2020/9/6, option ov_max()
See also
- exp_go_over/3
- exp_gene_family_string_graph/4
 exp_go_over_string_graphs_multi(+Opts)
Run multiple exp_go_over_string_graphs/4 runs.

Can ran across a number of values for parameters and number of datasets. A directory is created for each combination of input parameters. Subdirectories are created for each input dataset.

Can be used with single parameter values as well, in order to add that parameter to the output directory's name.

Opts

data(Data=data)
which datasets to use, can be a directory
data_ext(DtExt=csv)
extension selector for when Data is a directory
dated(Dated=true)
adds date (ate_two_digit_dotted/1) stamp on output directories
dfx_opts(DfxOpts)
invariant options for exp_go_over_string_graphs/4
dfx_viz_opts(VizOpts)
invariant options for viz_de_opts() of exp_go_over_string_graphs/4
keep_mtx_sel(KpSel=true)
record, to file, the mtx of values significant values selected
keep_mtx_viz(KpViz=true)
record, to file, the mtx of values added for vizualisation (non significants)
mtx_opts(MtxOpts=[])
options to pass when reading the input matrices
multi(Keys, Vals)
parameters and associated possible values to pass to exp_go_over_string_graphs/4 (Vals, can be a single value). If multiple Keys are given, then they synchronise- always taking the
viz_diffex_only_constraint(DoCon=true)
enforce that when de_max() and exp_pv_cut() are the same, use viz_de_opts(diffex_only)
viz_pv_constraint(PvCon=true)
enforce that PvCut for non-significants must be equal or larger than that for significants

multi() options are processed in the order they are given, with later ones varied first in the runs.

?- Opts = [],
   exp_go_over_string_graphs_multi( Opts ).
author
- nicos angelopoulos
version
- 0:1 2020/09/17
To be done
- shorten to exp_multi/1 ?
 gene_family(+Family, +Org, -Symbols)
If Family is a known family alias for organism Org, it is expanded to a list of its constituent gene symbols.

Currently 2 organisms are supported: human and mouse.
Family can be a gene ontology term (atom of the form GO:XXXXXX).
If family is a list of numbers or atoms that map to numbers, then they are taken to be Entrez ids which are converted to gene symbols in Symbols.

If family is a list of symbols, it is passed on to Symbols.

Listens to debug(gene_family).

Known families:

autophagy/human
from: http://autophagy.lu/clustering/index.html
 Located family as the bio_analytics gene collection: autophagy
 ?- gene_family( autophagy, human, Auto ), length( Auto, Len ).
 Auto = ['AMBRA1', 'APOL1', 'ARNT', 'ARSA', 'ARSB', 'ATF4', 'ATF6', 'ATG10', 'ATG12'|...],
 Len = 232.

 ?- debug( gene_family ).
 ?- gene_family( 375, hs, Symbs ), length( Symbs, Len ).
 % Located GO term as the family identifier.
 Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...],
 Len = 26.
 
 ?- gene_family( 'GO:0000375', hs, Symbs ), length( Symbs, Len ).
 % Located GO term as the family identifier.
 Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...],
 Len = 26.

 ?- gene_family( [55626, 8542, 405], hs, Auto ).
 Converted input from Entrezes to Symbols.
 Auto = ['AMBRA1', 'APOL1', 'ARNT'].

 ?- gene_family( unknown, hs, Auto ).
 ERROR: Unhandled exception: gene_family(cannot_find_input_family_in_the_known_ones(unknown,[autophagy]))

Family datasets are in pack(bio_analytics/data/families).

author
- nicos angelopoulos
version
- 0:1 2019/3/5, from old code
To be done
- error reporting via print_message/2
 go_org_symbol(+Org, +Go, -Symb)
Get symbols of Go id for Organism.

Go can be either a GO: atom or an integer.

[debug]  ?- go_org_symbol( mouse, 2, Symbs ), write(Symbs), nl, fail.
Akt3
Mef2a
Mgme1
Mpv17
Mrpl15
Mrpl17
Mrpl39
Msto1
Opa1
Slc25a33
Slc25a36
Tymp
false.

[debug]  ?- go_org_symbol( hs, 'GO:0000002', Symbs ), write(Symbs), nl, fail.
AKT3
LONP1
MEF2A
MGME1
MPV17
MSTO1
OPA1
PIF1
SLC25A33
SLC25A36
SLC25A4
TYMP
false.
author
- nicos angelopoulos
version
- 0:1 2019/04/07
 go_org_symbols(+GoT, +Org, -Symbols)
Gets the symbols belonging to a GO term.
?- go_org_symbols( 'GO:0000375', human, Symbs ), length( Symbs, Len ).
Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP'|...],
Len = 26.

?- go_org_symbols( 'GO:0000375', mouse, Symbs ), length( Symbs, Len ).
Symbs = ['Dbr1', 'Rbm17', 'Rnu4atac', 'Rnu6atac', 'Scaf11', 'Sf3a2', 'Slu7', 'Srrm1', 'Srsf10'|...],
Len = 10.

?- go_org_symbols( 375, human, Symbs ), length( Symbs, Len ).
Symbs = ['BCAS2', 'DBR1', 'DDX23', 'GEMIN2', 'KHSRP', 'LSM1', 'MPHOSPH10', 'PRPF3', 'PRPF4'|...],
Len = 26.
author
- nicos angelopoulos
version
- 0.1 2019/4/7
See also
- go_org_symb/3, just a findall of
- go_symbols_reach/3 for a version that travells the ontology
 go_string_graph(+GoTerm, -Graph, +Opts)
Plot the STRING graph of all the Symbols in a GO term.

The Graph can be saved via options to wgraph_plot/2.

Opts are passed to symbols_string_graph/3, go_term_symbols/3 and wgraph_plot/2.

Opts

organism(Org=hs)
organism for the operation
plot(Plot=false)
whether to plot the graph
save(Save=false)
whether to save the graph (passed to wgraph_plot/2
stem_type(Sty=go_name)
constructs stem for the output filenames: go_id, go_pair, go_pair_ord(I,Len).
?- go_string_graph( 'GO:0016601', G, true ).
?- go_string_graph( 'GO:0016601', G, plot(true) ).

?- go_string_graph( 'GO:0016601', G, [plot(true),save(true)] ).
G = ['ABI2', 'AIF1', 'CDH13', 'HACD3', 'NISCH', 'ALS2'-'RAC1':892, 'BRK1'-'CYFIP1':999, ... - ... : 904, ... : ...|...].
?- ls.
% Rac_protein_signal_transduction.csv                Rac_protein_signal_transduction_graph.csv          Rac_protein_signal_transduction_layout.csv
true.

?- go_string_graph( 'GO:0016601', G, [plot(true),save(true),stem_type(go_pair)] ).
author
- nicos angelopoulos
version
- 0.1 2016/4/12
- 0.2 2020/9/3, expanded go_string_graph_stem/3
To be done
- make sure underlying options are compatible.
 go_symbols_reach(+GoT, -Symbols, +Opts)
Gets the symbols belonging to a GO term. Descents to GO child relations, which by default are includes (reverse of is_a) and consists_of (reverse of part_of) to pick up Symbols recursively.

Opts

org(Org=hs)
also known is: mouse. Note that human, also resolves to hs
descent(Desc=true)
whether to collect symbols from descendant GO terms
as_child_includes(Inc=true)
collect from edge_gont_include/2
as_child_consists_of(Cns=true)
collect from edge_gont_consists_of/2
as_child_regulates(Reg=false)
collect from edge_gont_regulates/2
as_child_negatively_regulates(Reg=false)
collect from edge_gont_negatively_regulates/2
as_child_positively_regulates(Reg=false)
collect from edge_gont_positively_regulates/2
debug(Dbg=false)
see options_append/3

Listens to debug(go_symbols_reach).

?- go_symbols_reach( 'GO:0000375', Symbs, [] ), length( Symbs, Len ).
Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...],
Len = 293.

?- go_symbols_reach( 'GO:0000375', Symbs, org(mouse) ), length( Symbs, Len ).
Symbs = ['4930595M18Rik', 'Aar2', 'Aqr', 'Bud13', 'Bud31', 'Casc3', 'Cdc40', 'Cdc5l', 'Cdk13'|...].
Len = 190.

?- go_symbols_reach( 375, Symbs, true ), length( Symbs, Len ).
Symbs = ['AAR2', 'ALYREF', 'AQR', 'ARC', 'BCAS2', 'BUD13', 'BUD31', 'CACTIN', 'CASC3'|...],
Len = 293.
author
- nicos angelopoulos
version
- 0.1 2015/7/26
- 0.2 2019/4/7 added org, moved to new pack
 symbols_string_graph(+Symbols, -Graph, +Opts)
Create the string database Graph between Symbols.

Opts

cohese(Coh=max)
method for cohesing multiple edges between two nodes [false,max,min,umax,umin] the max and min versions are more efficient but do sort the edges, whereas umax and umin take much longer but leave edges in found order
include_orphans(Orph=true)
set to false to exclude orphans from Graph
org(Org=hs)
which organism do the gene symbols come from
minw(500)
minimum weight (0 =< x =< 999) - not checked
sort_pairs(Spairs=true)
set to false to leave order of edges dependant on order of Symbols
sort_graph(Sort=true)
set to false for not sorting the results
?- Got = 'GO:0043552', gene_family( Got, hs, Symbs ), length( Symbs, SymbsLen ),
   symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ).

Got = 'GO:0043552',
Symbs = ['AMBRA1', 'ATG14', 'CCL19', 'CCL21', 'CCR7'|...],
SymbsLen = 31,
Graph = ['TNFAIP8L3', 'AMBRA1'-'ATG14':981, 'AMBRA1'-'PIK3R4':974|...],
GraphLen = 235.

% the following is flawed as it uses Got from mouse and tries to build the graph in human STRING...
?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ),
   symbols_string_graph(Symbs, Graph, true ), length( Graph, GraphLen ).

Got = 'GO:0043552',
Symbs = Graph, Graph = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8', 'Fgf2', 'Fgfr3', 'Fgr'|...],
SymbsLen = GraphLen, GraphLen = 28.

% this the correct way for running the first query for mouse:
?- Got = 'GO:0043552', gene_family( Got, mouse, Symbs ), length( Symbs, SymbsLen ),
   symbols_string_graph(Symbs, Graph, org(mouse) ), length( Graph, GraphLen ).

Got = 'GO:0043552',
Symbs = ['Ambra1', 'Atg14', 'Ccl19', 'Cd19', 'Cdc42', 'Epha8'|...],
SymbsLen = 28,
Graph = ['Ambra1', 'Cdc42', 'Lyn', 'Nod2', 'Pdgfra', 'Sh3glb1', 'Tnfaip8l3', 'Vav3', ... : ...|...],
GraphLen = 117.
author
- nicos angelopoulos
version
- 0.1 2016/1/18
- 0.2 2019/4/8, added organism to incorporate mouse
- 0.3 2020/9/5, option cohese(), and debugs
To be done
- implement cohese() values: min, umax and umin