bio_db.pl -- Access, use and manage big, biological datasets.

Bio_db gives access to pre-packed biological databases and simplifies management and translation of biological data to Prolog friendly formats.

There are currently 2 major types of data supported: maps, and graphs. Maps define product mappings, translations and memberships, while graphs define interactions which can be visualised as weighed graphs.

There are 2 prolog flags (see current_prolog_flag/2) that can control the behaviour of the library: bio_db_qcompile (def: true) and bio_db_interface (def: prolog). When the first one is set to false, it can disable the compilation to

Bio_db itself does not come with the datasets. You can either download the separate pack(bio_db_repo) which contains all of the Prolog datasets (111Mb compressed data), or let auto-downloading retrieve the datasets serving each of the data predicates as you query them. Auto-downloading works transparently to the user, where a data set is downloaded by simply calling the predicate. For example

?- map_hgnc_symb_hgnc( 'LMTK3', Hgnc ).
% prolog DB:table hgnc:map_hgnc_symb_hgnc/2 is not installed, do you want to download (Y/n) ?
% Trying to get: url_file(http://www.stoics.org.uk/bio_db_repo/data/maps/hgnc/map_hgnc_symb_hgnc.pl,/usr/local/users/nicos/local/git/test_bio_db/data/maps/hgnc/map_hgnc_symb_hgnc.pl)
% Loading prolog db: /usr/local/users/nicos/local/git/test_bio_db/data/maps/hgnc/map_hgnc_symb_hgnc.pl
Hgnc = 19295.

?- bio_db_interface( prosqlite ).
% Setting bio_db_interface prolog_flag, to: prosqlite
true.

?- map_hgnc_prev_symb( Prv, Symb ).
% prosqlite DB:table hgnc:map_hgnc_prev_symb/2 is not installed, do you want to download (Y/n) ?
% Trying to get: url_file(http://www.stoics.org.uk/bio_db_repo/data/maps/hgnc/map_hgnc_prev_symb.sqlite,/usr/local/users/nicos/local/git/test_bio_db/data/maps/hgnc/map_hgnc_prev_symb.sqlite)
false.

?- map_hgnc_prev_symb( Prv, Symb ).
% prosqlite DB:table hgnc:map_hgnc_prev_symb/2 is not installed, do you want to download (Y/n) ?
% Trying to get: url_file(http://www.stoics.org.uk/bio_db_repo/data/maps/hgnc/map_hgnc_prev_symb.sqlite,/usr/local/users/nicos/local/git/test_bio_db/data/maps/hgnc/map_hgnc_prev_symb.sqlite)
% Loading prosqlite db: /usr/local/users/nicos/local/git/test_bio_db/data/maps/hgnc/map_hgnc_prev_symb.sqlite
Prv = 'A1BG-AS',
Symb = 'A1BG-AS1' .

Databases

Ensembl=ense
Homo sapiens genes and proteins. Genes and trascripts mappings along with mapping to genomic location (latter not included in release yet)
HGNC=hgnc
Hugo Gene Nomenclature Committee, http://www.genenames.org/
NCBI=ncbi
NCBI
Uniprot=unip
Protein database.
String
Protein-Protein interactions data base
Interactome
pathways (not yet included to public release)

For each database a token with the same token means that the field is the unique identifier of the object in that database.

Tokens

symb
HGNC gene symbol (=short name)
name
HGNC gene name (longer, less startardinse that symb)
prev
HGNC previous gene symbol
syno
HGNC gene symbol synonym
ensg
ensembl gene
enst
ensembl transcript
ensp
ensembl protein
gonm
GO name of a term
pros
Prosite protein family information
rnuc
RNA nucleic sequence ID to HGNC symbol.
unig
uniprotein gene id
sprt
Swiss-Prot part of Uniprot (high quality, curated)
trem
TrEMBL part of Uniprot (non curated)

The name convension for maps is

 ?- map_hgnc_hgnc_symb( Hgnc, Symb ).
 Hgnc = 1,
 Symb = 'A12M1~withdrawn' ;
 Hgnc = 2,
 Symb = 'A12M2~withdrawn' .

 ?- map_hgnc_hgnc_symb( 19295, Symb ).
 Symb = 'LMTK3'.

 ?- map_hgnc_symb_hgnc( 'LMTK3', Hgnc ).
Hgnc = 19295.

Where the first hgnc corresponds to the source database, the second identifies the first argument of the map to be the unique identifier field for that database (here a positive integer starting at 1 and with no gaps), The last part of the predicate name corresponds to the second argument, which here is the unique Symbol assigned to a gene by HGNC. In the current version of bio_db, all tokens in map filenames are 4 characters long. Map data for predicate Pname from database DB are looked for in DB(Pname.Ext) (see bio_db_paths/0). Extension, Ext, depends on the current bio_db database interface (see bio_db_interface/1), and it is sqlite if the interface is prosqlite and pl otherwise.

The name convesion for graphs is

?- edge_string_hs_symb( Symb1, Symb2, W ).
S1 = 'A1BG',
S2 = 'ABAT',
W = 360 ;
S1 = 'A1BG',
S2 = 'ABCC6',
W = 158 .

Where only the first and second tokens, edge and string respectively, are controlled. The second token indicates the database of origin. Graph data for predicate Pname from database DB are looked for in bio_db_data(graphs/DB/Pname.Ext) (see bio_db_paths/1). Extension, Ext, depends on the current bio_db database interface (see bio_db_interface/1), and it is sqlite if the interface is prosqlite and pl otherwise.

Bio_db supports four db interfaces: prolog, prosqlite, berkeley and rocks. The first one is via Prolog fact bases, which is the default. The second is an interface to SQLite via pack(prosqlite) while the third and fourth work with the SWI-Prolog packs bdb and rocksdb. The underlying mechanisms are entirely transparent to the user. In order to use the sqlite data sources pack(prosqlite) needs to be installed via the pack manager

 ?- pack_install( prosqlite ).

The user can control which interface is in use with the bio_db_interface/1 predicate.

 ?- bio_db_interface( Curr ).
 Curr = prolog.

 ?- bio_db_interface( prosqlite ).

 ?- bio_db_interface( Curr ).
 Curr = prosqlite.

The type of the interface of a bio_db data predicate is determined by the interface at the time of first call.

Once the user has initiated the serving of a predicate via calling a goal to it, it is then possible to have access to information about the dataset such as download date and sourle url.

?- map_hgnc_hgnc_symb( Hgnc, Symb ).
Hgnc = 1,
Symb = 'A12M1~withdrawn' .

?- bio_db_info( map_hgnc_hgnc_symb, Key, Value ), write( Key-Value ), nl, fail.
source_url-ftp://ftp.ebi.ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz
datetime-datetime(2016,9,8,0,8,40)
header-row(HGNC ID,Approved Symbol)
unique_lengths-unique_lengths(44266,44266,44266)
relation_type-relation_type(1,1)
false.

Thanks to Jan Wielemaker for a retractall fix and for code for fast loading of precompiled fact bases (and indeed for the changes in SWI that made this possible).

author
- nicos angelopoulos
version
- 0.5 2016/9/11
- 0.7 2016/10/21, experimenting with distros in github
- 0.9 2017/3/10, small changes for pack(requires) -> pack(lib) v1.1
- 1.0 2017/10/9 to coincide with ppdp paper presentation
See also
- doc/Realeases.txt for version details.
 bio_db_paths
Initialisation call- setting up path aliases.
There are two main directory repositories the predicate deals with: (a) the bio_db installed databases root (alias bio_db_data), and (b) the root of downloaded databases (alias bio_db_downloads). Optionally a top directory of which both (a) and (b) are subdirs can be defined (alias bio_db). The default value for alias bio_db is a made-up pack directory pack(bio_db_repo). The default for bio_db_data is sub directory data of alias bio_db, while bio_db_downloads defaults to sub directory downloads of the alias bio_db. The canonical subdirectory name for (a) is data and for (b) is downloads.

pack(bio_db_repo) can also be installed as a standalone package from SWI's manager.

?- pack_install( bio_db_repo ).
This will install all but one of the Prolog database files. Sqlite files can only be downloaded on-demand. The one Prolog DB file missing is edge_string_hs.pl from data/graphs/string/. It has been excluded because it is way bigger than the rest, sizing at 0.5 Gb. It can be downloaded on-demand, transparently to the user upon invocation of the associated, arity 3 predicate.
Directory locations for (a) and (b) above can be given as either prolog flags with key bio_db_root and bio_dn_root respectively or via environment variables BioDbRoot and BioDnRoot.

Installed root alias(bio_db_data) contains sub-dirs

graphs
for graphs; string and reactome
maps
for all the supported maps

The above are mapped to aliases bio_graphs and bio_maps respectively. Within each of these sub-directories there is further structure based on the database the set was originated.

Downloaded root alias(bio_db_downloads) may contain sub-dirs

hgnc
data from HGNC database
ncbi
data from NCBI database
reactome
data from Reactome database
string
data from string database
uniprot
protein data from EBI
ense
ensembl database

Alias bio_db_downloads is only useful if you are downloading data files directly from the supported databases.

See

?- absolute_file_name( packs(bio_db(auxil)), Auxil ), ls( Auxil ).

for examples of how these can be used.

For most users these aliases are not needed as the library manages them automatically.

To be done
- transfer datasets and downloads to new pack location when running on newly installed SWI version upgrade.
 bio_db_version(?Vers, -Date)
Version Mj:Mn:Fx, and release date date(Y,M,D).
See also
- doc/Releases.txt for more detail on change log
 bio_db_citation(-Atom, -Bibterm)
This predicate succeeds once for each publication related to this library. Atom is the atom representation suitable for printing while Bibterm is a bibtex(Type,Key,Pairs) term of the same publication. Produces all related publications on backtracking.
 bio_db_source(?Type, ?Db)
True if Db is a source database for bio_db serving predicate of type Type. Type is either maps or graphs.

The databases are

 bio_db_interface(?Iface, -Status)
Interrogate the installation status (true or false) of bio_db's known interfaces. true if the interface dependencies are installed and the interface can be used, and =|false=| otherwise.

Can be used to enumerate all known or installed interfaces.

 ?- findall( Iface, bio_db_interface(Iface,_), Ifaces ).
 Ifaces = [prolog, berkeley, prosqlite, rocks].
 bio_db_interface(?Iface)
Interrogate or set the current interface for bio_db database predicates. By default Iface = prolog. Also supported: prosqlite (needs pack proSQLite), berkley (needs SWI's own library(bdb) and rocks (needs pack(rocskdb).
?- bio_db_interface( Iface ).
Iface = prolog.

?- debug( bio_db ).
true.

?- bio_db_interface( wrong ).
% Could not set bio_db_interface prolog_flag, to: wrong, which in not one of: [prolog,prosqlite,berkeley,rocks]
false.

?- bio_db_interface( Iface ).
Iface = prolog.

?- map_hgnc_symb_hgnc( 'LMTK3', Hgnc ).
% Loading prolog db: /usr/local/users/nicos/local/git/lib/swipl-7.1.32/pack/bio_db_repo/data/maps/hgnc/map_hgnc_symb_hgnc.pl
Hgnc = 19295.

?- bio_db_interface( prosqlite ).
% Setting bio_db_interface prolog_flag, to: prosqlite
true.

?- map_hgnc_prev_symb( Prev, Symb ).
% prosqlite DB:table hgnc:map_hgnc_prev_symb/2 is not installed, do you want to download (Y/n) ?
% Execution Aborted
?- map_hgnc_prev_symb( Prev, Symb ).
% Loading prosqlite db: /usr/local/users/nicos/local/git/lib/swipl-7.1.32/pack/bio_db_repo/data/maps/hgnc/map_hgnc_prev_symb.sqlite
Prev = 'A1BG-AS',
Symb = 'A1BG-AS1' ;

In which case Iface is prosqlite.

 bio_db_install(+PidOrPname, +Iface)
 bio_db_install(+PidOrPname, +Iface, +Opts)
Install the interface (Iface) for bio_db database that corresponds to predicate identifier (Pid) or a predicate name (Pname). Note that this is not necessary to do in advance as the library will auto load missing Iface and Pid combinations when first interrogated.

Opts

interactive(true)
set false to accept default interactions
 bio_db_info(+Pid, ?Iface)
 bio_db_info(+Pid, ?Key, -Value)
 bio_db_info(+Iface, +Pid, ?Key, -Value)
Retrieve information about bio_db database predicates.

When Iface is not given, Key and Value are those of the interface under which Pid is currently open for access. The predicate errors if Pid is not open for serving yet.

The bio_db_info/2 version succeeds for all interfaces Pid is installed- it is simply a shortcut to: bio_db_info( Iface, Pid, _, _ ).

The Key-Value information returned are about the particular data predicate as saved in the specific backend.

Key

source_url
an atomic value of the URL
datetime
datetime/6 term
data_types
data_types/n given the primary type for each argyument in the data table
header
row/n term, where n is the number of columns in the data table
unique_lengths
unique_lengths/3 term, lengths for the ordered sets of: Ks, Vs and KVs
relation_type(From, TO)
where From and To take values in 1 and m
?- bio_db_info( Iface, map_hgnc_hgnc_symb/2, Key, Value), write( Iface:Key:Value ), nl, fail.
prolog:source_url:ftp://ftp.ebi.ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz
prolog:datetime:datetime(2016,9,10,0,2,14)
prolog:data_types:data_types(integer,atom)
prolog:unique_lengths:unique_lengths(44266,44266,44266)
prolog:relation_type:relation_type(1,1)
prolog:header:row(HGNC ID,Approved Symbol)
prosqlite:source_url:ftp://ftp.ebi.ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz
prosqlite:datetime:datetime(2016,9,10,0,2,14)
prosqlite:data_types:data_types(integer,atom)
prosqlite:unique_lengths:unique_lengths(44266,44266,44266)
prosqlite:relation_type:relation_type(1,1)
prosqlite:header:row(HGNC ID,Approved Symbol)
 bio_db_close(+Pid)
Close the current serving of predicate Pid. Next time a Pid Goal is called the current interface (bio_db_interface/1) will be used to establish a new server and resolve the query.
Predicate throws an error if the Pid does not correspond to a db_predicate or if it is not currently servered by any of the backends.

=== ?- bio_db_interface( prosqlite ). ?- map_hgnc_hgnc_symb( Hgnc, Symb ). Hgnc = 506, Symb = 'ANT3~withdrawn' .

?- bio_db_close( map_hgnc_hgnc_symb/2 ). ?- bio_db_interface( prolog ). ?- map_hgnc_hgnc_symb( Hgnc, Symb ). Hgnc = 1, Symb = 'A12M1~withdrawn' . ?- bio_db_close( map_hgnc_hgnc_symb/2 ).

===

 bio_db_close_connections
Close all currently open bio_db backend connections.

This is called by bio_db at halt.

 bio_db_db_predicate(?Pid)
True if Pid is a predicate identifier which is defined in bio_db and starts with either edge_ or map_. When Pid is a free variable all such predicate identifiers are returned on backtracking.
  ?- bio_db_db_predicate( map_hgnc_hgnc_symb/2 ).
  true.

  ?- bio_db_db_predicate( X ).
  X = map_hgnc_symb_entz/2 ;
  X = map_ense_enst_ensg/2 ;
  ...
 edge_string_hs(?EnsP1, ?EnsP2, ?W)
Weighted graph edges predicate from String database between Ensembl protein ids. W is an integger in 0 < W < 1000.
Symb = 'LMTK3'.
 map_hgnc_symb_entz( 'LMTK3', Entz ), map_ncbi_entz_ensp( Entz, EnsP ), edge_string_hs( EnsP, Inter, W ).
Entz = 114783,
EnsP = 'ENSP00000270238',
Inter = 'ENSP00000075503',
W = 186 ;
Entz = 114783,
EnsP = 'ENSP00000270238',
Inter = 'ENSP00000162044',
W = 165 ;
Entz = 114783,
EnsP = 'ENSP00000270238',
Inter = 'ENSP00000178640',
W = 389 ...
 edge_string_hs_symb(?Symb1, ?Symb2, ?W)
Weighted graph edges predicate from String database between HGNC symbol ids. W is an integger in 0 < W < 1000.
?- edge_string_hs_symb( 'LMTK3', Inter, W ).
Inter = 'MAP2K5',
W = 389 ;
Inter = 'MAPK3',
W = 157 ;
Inter = 'MASTL',
W = 211 ;
Inter = 'MDC1',
W = 198 ;
Inter = 'MFSD2A',
W = 165 ;
Inter = 'MRPS30',
W = 179 ....
 edge_gont_includes(?Pa, ?Ch)
Reciprocal of edge_gont_is_a/2.
 edge_gont_is_a(?Ch, ?Pa)
Gene ontotology is_a relation. Ch (a GO term) is_a (part of) Pa (a GO term).
  ?- edge_gont_is_a(G1,G2), map_gont_gont_gonm( G1, N1 ), map_gont_gont_gonm( G2, N2 ).
  G1 = 'GO:0000001',
  G2 = 'GO:0048308',
  N1 = 'mitochondrion inheritance',
  N2 = 'organelle inheritance' .
 edge_gont_regulates(?Pa, ?Ch)
Pa regulates Ch (GO hirerchical relation).
 edge_gont_positively_regulates(?Pa, ?Ch)
Pa positively regulates Ch (GO hirerchical relation).
 edge_gont_negatively_regulates(?Pa, ?Ch)
Pa negatively regulates Ch (GO hirerchical relation).
 edge_gont_part_of(?Part, ?Whole)
Part is part of Whole (GO hirerchical relation).
 edge_gont_consists_of(?Whole, ?Part)
Whole consists (in part) of Part (reciprocal of edge_gont_part_of/2).
?- edge_gont_part_of( A, B ),\+ edge_gont_consists_of( B, A).
false.
 map_hgnc_hgnc_symb(?Hgnc, ?Symb)
Map predicate from HGNC unique integer identifier to unique gene symbol.
?- map_hgnc_hgnc_symb( 19295, Symb ).
Symb = 'LMTK3'.
 map_hgnc_hgnc_name(?Hgnc, ?Symb)
Map predicate from HGNC unique integer identifier to unique gene name/description.
?- map_hgnc_hgnc_name( 19295, Name ).
Name = 'lemur tyrosine kinase 3'.
 map_hgnc_symb_hgnc(?Symb, ?Hgnc)
Map predicate from HGNC unique symbol to unique HGNC integer identifier.
?- map_hgnc_symb_hgnc( 'LMTK3', HGNC ).
HGNC = 19295.
 map_hgnc_syno_symb(?Syno, ?Symb)
Map predicate from gene synonyms to approved HGNC Symbol.
?- map_hgnc_syno_symb( 'LMR3', Symb ).
Symb = 'LMTK3'.
 map_hgnc_prev_symb(?Prev, ?Symb)
Map predicate from previously known-as gene names to approved HGNC Symbol.
?- map_hgnc_prev_symb( 'ERBB', Symb ).
Symb = 'EGFR'.
 map_hgnc_ccds_hgnc(?Ccds, ?Hgnc)
Map predicate from concesus protein coding regions to HGNC ID.
?- map_hgnc_ccds_hgnc( 'CCDS11576', Hgnc ).
Hgnc = 11979.
 map_hgnc_hgnc_ccds(?Hgnc, ?Ccds)
Map predicate from HGNC ID to concesus protein coding regions.
?- map_hgnc_hgnc_ccds( 11979,  Ccds ).
Ccds = 'CCDS11576'.
 map_hgnc_ensg_hgnc(?Ensg, ?Symb)
Map predicate from Ensembl gene id to HGNC Id.
?- map_hgnc_ensg_hgnc( Ensg, 19295 ).
Ensg = 'ENSG00000142235'.
 map_hgnc_symb_entz(?Symb, ?Entz)
Map predicate from HGNC symbols to (NCBI) entrez gene ids.
?- map_hgnc_symb_entz( 'LMTK3', Etnz ).
Etnz = 114783.
 map_hgnc_entz_hgnc(?Entz, ?Symb)
Map predicate from entrez ids to approved HGNC Symbol.
?- map_hgnc_entz_hgnc( 114783, Symb ).
Symb = 19295.
 map_hgnc_entz_symb(?Entz, ?Symb)
Map predicate from entrez ids to approved HGNC Symbol.
?- map_hgnc_entz_symb( 114783, Symb ).
Symb = 'LMTK3'.
 map_hgnc_entz-appv_symb(?Entz, ?Symb)
Map predicate from entrez approved ids to HGNC Symbol.


 map_hgnc_entz-ncbi_symb(?Entz, ?Symb)
Map predicate from entrez approved ids to HGNC Symbol. == ==
 map_hgnc_hgnc_chrb(+Hgnc, -ChrB)
Map predicate from HGNC ID to Chromosome Band
?- map_hgnc_hgnc_chrb( 5, ChrB ).
ChrB = '19q13.43'
 map_hgnc_hgnc_ensg(+Hgnc, -EnsG)
Map predicate from HGNC ID to Ensembl Gene == ==
 map_hgnc_hgnc_entz-appv(+Hgnc, -EntzAppv)
Map predicate from HGNC ID to approved Ensembl Gene == ==
 map_hgnc_hgnc_entz-ncbi(+Hgnc, -EntzNcbi)
Map predicate from HGNC ID to NCBI provided Ensembl Gene == ==
 map_hgnc_hgnc_entz(+Hgnc, -Entz)
Map predicate from HGNC ID to Ensembl Gene (by all means available) == ==
 map_pros_pros_prsn(+Pros, -Prsn)
Map predicate: Prosite ID to Prosite Name.

== ==

 map_pros_pros_sprt(+Pros, -Prsn, -Sprt, -Symb, -Start, -End, -Seqn)
Map predicate from Prosite ID to (SwissProt) Protein info

== ==

 map_ncbi_ensp_entz(?EnsP, ?Entz)
Map predicate from Ensembl proteins to NCBI/entrez gene ids.
?- map_ncbi_ensp_entz( 'ENSP00000270238', Entz ).
Entz = 114783.
 map_ncbi_ensg_entz(?EnsG, ?Entz)
Map predicate from Ensembl genes to NCBI/entrez gene ids.
?- map_ncbi_ensg_entz( 'ENSG00000142235', Entz ).
Entz = 114783.
 map_ncbi_entz_ensp(?Entz, ?EnsP)
Map predicate from NCBI/entrez gene ids to Ensembl proteins.
?- map_ncbi_entz_ensp( 114783, EnsP ).
EnsP = 'ENSP00000270238'.
 map_ncbi_rnuc_symb(RnaNucl, Symb)
Map predicate from RNA nucleic sequence to HGNC symbol.
?- map_ncbi_rnuc_symb( 'BC140794', Symb ).
Symb = 'CEP170'.
 map_ncbi_dnuc_symb(DnaNucl, Symb)
Map predicate from DNA nucleic sequence to HGNC symbol.
?- map_ncbi_dnuc_symb( 'AL669831', Symb ).
Symb = 'CICP3' ;
...
Symb = 'TUBB8P11'.
 map_ncbi_unig_entz(UniG, Entz)
Map predicate from unigene to entrez id as per ncbi.
?- map_ncbi_unig_entz( 'Hs.80828', Entz ).
Entz = 3848.
 map_ncbi_entz_ensg(?Entz, ?EnsG)
Map predicate from NCBI/entrez gene ids to Ensembl genes.
?- map_ncbi_entz_ensg( 114783, EnsP ).
EnsP = 'ENSG00000142235'.
 map_unip_hgnc_unip(+Hgnc, -UniP)
Map predicate from HGNC gene ids to Uniprot proteins.
?- map_unip_hgnc_unip( Hgnc, Unip ).
 map_unip_unip_hgnc(?UniP, ?Hgnc)
Map predicate from Uniprot proteins to HGNC ids.
?-  map_unip_unip_hgnc( 'Q96Q04', Hgnc ).
Hgnc = 19295.

?- map_unip_unip_hgnc( 'A0A0A0MQW5', Hgnc ).
Hgnc = 19295.
 map_unip_unip_unig(?UniP, ?UniG)
Map predicate from Uniprot proteins to Uniprot genes.
?- map_unip_unip_unig( 'Q96Q04', UniG ).
UniG = 'Hs.207426'.
 map_unip_sprt_seqn(?Swissprot, ?Seqn)
Map predicate from Uniprot (Swiprot, the curated parts) to its sequence.
?- map_unip_sprt_seqn( 'Q96Q04', Seqn ).

UniG = 'MPAPGALI....'.
 map_unip_trem_seqn(?Trem, ?Seqn)
Map predicate from Uniprot (Trembl, the un-curated parts) to its sequence.
?- map_unip_trem_seqn( 'Q96Q04', UniG ).
 map_unip_ensp_unip(?EnsP, ?UniP)
Map predicate from Ensembl proteins to Uniprot proteins.
?- map_unip_ensp_unip( 'ENSP00000472020', UniP ).
UniP = 'Q96Q04'.
 map_unip_trem_nucs(?Trem, ?Nucs)
Map predicate from treMBLE protein to Nucleotide sequence (ENA). This is an Many to Many relation.
?- map_unip_trem_nucs( 'B2RTS4', Nucs ).
Nucs = 'BC140794'.

?- map_unip_trem_nucs( 'B4E273', Nucs ), map_unip_trem_nucs( Trem, Nucs ), write( Trem-Nucs ), nl, fail.

B4E273-AK304141
B4E273-BC143676
A0A0A0MTC0-CH471056
A0A0A0MTJ2-CH471056
A0A0C4DG83-CH471056
A2VDJ0-CH471056
A6NFD8-CH471056
B2RTX2-CH471056
....
 map_unip_unip_entz(?UniP, ?Entz)
Map predicate from Uniprot proteins to Entrez ids.
?- map_unip_unip_entz( 'Q96Q04', Entz ).
Entz = 114783.
 map_gont_gont_symb(?Gont, ?Symb)
Map predicate from GO terms to approved HGNC Symbol.
?- map_gont_gont_symb( 'GO:0003674', Symb ).
Symb = 'A1BG' ;
Symb = 'AAAS' ;
Symb = 'AARSD1'...
 map_gont_gont_gonm(?Gont, ?Gonm)
Map predicate from gene ontology terms to GO term names.
?- map_gont_gont_gonm( 'GO:0004674', A ).
A = 'protein serine/threonine kinase activity'.
 map_gont_symb_gont(?Symb, ?Gont)
Map predicate from HGNC symbols to GO terms.
?- map_gont_symb_gont( 'LMTK3', Symb ).
Symb = 'GO:0003674' ;
Symb = 'GO:0004674' ;
Symb = 'GO:0004713' ;
Symb = 'GO:0005524' ;
Symb = 'GO:0005575' ;
Symb = 'GO:0006468' ;
Symb = 'GO:0010923' ;
Symb = 'GO:0016021' ;
Symb = 'GO:0018108'.
 map_ense_ensg_hgnc(?EnsG, ?Hgnc)
Ensembl gene to HGNC ID with data drawn from Ensembl.
 map_ense_ensg_symb(?EnsG, ?Hgnc)
Ensembl gene to HGNC Symbol with data drawn from Ensembl.
 map_ense_enst_chrl(+EnsT, -Chr, -Start, -End, -Dir)
Ensembl transcript chromosomal location.

Chr is the chromosome, Start the start position, End the end position and Dir is the direction of the transcript.

 map_ense_ensg_chrl(+EnsG, -Chr, -Start, -End, -Dir)
Ensembl gene to chromosomal location. Chr is the chromosome, Start the start position, End the end position and Dir is the direction of the transcript.
 map_ense_enst_ensg(+EnsT, -EnsG)
Ensembl Transcript to Ensembl Gene with data drawn from Ensembl.
 hgnc_symbol(?Symbol)
True iff Symbol is an HGNC symbol (deterministic for +Symbol).
?- bio_db:hgnc_symbol( 'LMTK3' ).
true.
 go_term_symbols(+GoT, -Symbols, -Opts)
Gets the symbols belonging to a GO term. Descents to GO child relations, which by default are includes (reverse of is_a) and consists_of (reverse of part_of) to pick up Symbols recursively.

FIXME: changing edge_--- to map_ ----

Opts
  * descent(Desc=true)
    whether to collect symbols from descendant GO terms
  * as_child_includes(Inc=true)
    collect from edge_gont_include/2
  * as_child_consists_of(Cns=true)
    collect from edge_gont_consists_of/2
  * as_child_regulates(Reg=false)
    collect from edge_gont_regulates/2
  * as_child_negatively_regulates(Reg=false)
    collect from edge_gont_negatively_regulates/2
  * as_child_positively_regulates(Reg=false)
    collect from edge_gont_positively_regulates/2
  * debug(Dbg=false)
    see options_append/3

Listens to debug(go_term_symbols).

go_term_symbols( 'GO:0000375', Symbs ).

Symbs = [ALYREF,AQR,ARC,BCAS2,BUD13,BUD31,C7orf55-LUC7L2,CACTIN,CCAR1,CD2BP2,CDC40,CDC5L,CDK13,CELF1,CELF2,CELF3,CELF4,CLNS1A,CLP1,CPSF1,CPSF2,CPSF3,CPSF7,CRNKL1,CSTF1,CSTF2,CSTF3,CTNNBL1,CWC15,CWC22,CWC27,DBR1,DCPS,DDX1,DDX17,DDX20,DDX23,DDX39A,DDX39B,DDX41,DDX46,DDX5,DGCR14,DHX15,DHX16,DHX32,DHX35,DHX38,DHX8,DHX9,DNAJC8,DQX1,EFTUD2,EIF4A3,FRG1,FUS,GCFC2,GEMIN2,GEMIN4,GEMIN5,GEMIN6,GEMIN7,GEMIN8,GPATCH1,GTF2F1,GTF2F2,HNRNPA0,HNRNPA1,HNRNPA2B1,HNRNPA3,HNRNPC,HNRNPD,HNRNPF,HNRNPH1,HNRNPH2,HNRNPH3,HNRNPK,HNRNPL,HNRNPM,HNRNPR,HNRNPU,HNRNPUL1,HSPA8,ISY1,KHSRP,LSM1,LSM2,LSM3,LSM6,LSM7,LSM8,LUC7L,LUC7L2,LUC7L3,MAGOH,MBNL1,METTL14,METTL3,MPHOSPH10,NCBP1,NCBP2,NCBP2L,NHP2L1,NOL3,NOVA1,NUDT21,PABPC1,PABPN1,PAPOLA,PAPOLB,PCBP1,PCBP2,PCF11,PHAX,PHF5A,PLRG1,PNN,POLR2A,POLR2B,POLR2C,POLR2D,POLR2E,POLR2F,POLR2G,POLR2H,POLR2I,POLR2J,POLR2K,POLR2L,PPIE,PPIH,PPIL1,PPIL3,PPWD1,PQBP1,PRMT5,PRMT7,PRPF19,PRPF3,PRPF31,PRPF4,PRPF4B,PRPF6,PRPF8,PSIP1,PTBP1,PTBP2,RALY,RBM17,RBM22,RBM5,RBM8A,RBMX,RBMXP1,RNPS1,RSRC1,SAP130,SART1,SART3,SCAF11,SETX,SF1,SF3A1,SF3A2,SF3A3,SF3B1,SF3B2,SF3B3,SF3B4,SF3B5,SF3B6,SFPQ,SFSWAP,SKIV2L2,SLU7,SMC1A,SMN1,SMN2,SMNDC1,SNRNP200,SNRNP40,SNRNP70,SNRPA,SNRPA1,SNRPB,SNRPB2,SNRPC,SNRPD1,SNRPD2,SNRPD3,SNRPE,SNRPF,SNRPG,SNRPGP15,SNUPN,SNW1,SRPK2,SRRM1,SRRM2,SRSF1,SRSF10,SRSF11,SRSF12,SRSF2,SRSF3,SRSF4,SRSF5,SRSF6,SRSF7,SRSF9,STRAP,SUGP1,SYF2,SYNCRIP,TDRD12,TFIP11,TGS1,TRA2A,TRA2B,TXNL4A,TXNL4B,U2AF1,U2AF2,UBL5,UPF3B,USP39,USP4,USP49,WBP4,WDR77,WDR83,XAB2,YBX1,YTHDC1,ZCCHC8,ZRSR2]
author
- nicos angelopoulos
version
- 0.1 2015/7/26
 symbols_string_graph(+Symbols, -Graph, +Opts)
Create the string database Graph between Symbols.

Opts

minw(0)
minimum weight (0 =< x =< 999) - not checked
sort_pairs(Spairs=true)
set to false to leave order of edges dependant on order of Symbols
include_orphans(Orph=true)
set false to exclude orphans from Graph
sort_graph(Sort=true)
set to false for not sorting the results
?- Gont = 'GO:0043552', findall( Symb, map_gont_gont_symb(Gont,Symb), Symbs ),
   symbols_string_graph( Symbs, Graph, [] ),
   length( Graph, Len ).
   
author
- nicos angelopoulos
version
- 0.1 2016/01/18