Bims (Bayesian inference over model structures) implements MCMC learning over statistical models defined in the Dlp (Distributional logic programming) probabilistic language.

Bims is released under GPL2, or Artistic 2.0

Currently there are 2 model spaces supported:

- Carts (Classification & Regression trees), and
- Bayesian Networks

Additional model spaces can be easily implemented by defining new likelihood plug-ins and programming appropriate priors.

?- bims( [] ). ?- bims( [data(carts),models(carts),likelihood(carts)] ).

The above are two equivalent ways to run the Carts example provided.

This runs 3 chains each of length 100 on the default Carts data using the default likelihood. The default dataset is the breast cancer Winsconsin (BCW) data from the machine learning repository. There are 2 categories, 9 variables and 683 data points in this dataset. You can view the data with

?- edit( pack(bims/data/carts) ).

The default likelihood is an implementation of the classification likelihood function presented in: H Chipman, E George, and R McCulloch. Bayesian CART model search (with discussion). J. of the American Statistical Association, 93:935â960, 1998.

?- bims( [models(bns)] ). ?- bims( [data(bns),models(bns),likelihood(bns)] ).

The above are two equivalent ways to run the Bns example provided.

This runs 3 chains each of length 100 on the default bns data using default likelihood. The dataset is a sampled dataset from the ASIA network and it comprises of 8 variables and 2295 datapoints. You can view the data with

?- edit( pack(bims/data/bns) ).

The default BN likelihood is an instance of the BDeu metric for scoring BN structures.

W. L. Buntine. Theory refinement of Bayesian networks. In Bruce DâAmbrosio, Philippe Smets, and Piero Bonissone, editors, Proceedings of the Seventh Annual Conference on Uncertainty in Artificial Intelligence (UAIâ1991), pages 52â60, 1991

David Heckerman, Dan Geiger, and David M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197â243, 1995.

An easy way to run Bims on your data is to create a new directory and within that sub-directory data/ copy your data there and pass options data/1 to the basename of the data file.

For example,

?- bims( data(mydata) ).

By defining a new likelihood function and new priors the system can be used on new statistical models.

**bims****bims**`(+File)`**bims**`(+Opts)`- Run a number of MCMC runs for a single prior defined by a Distributional Logic Program (DLP).
If the argument (

`File`) corresponds to an existing file, then it is taken to be a settings file. Each argument should be a fact correspond to a known option. For examplechains(3). iterations(100). seeds([1,2,3]).

If the argument (

`Opts`) does not correspond to a file is take to be a list of option terms.The simplest way to use the software is to make a new directory and run some MCMC chains. The default call,

?- bims(). % equivelant to ?- bims([]).

runs a 3 chains (R=3, below) 100 iterations (I=100) MCMC simulation. The models learnt are classifications trees (carts) based on the default prior and the data are the BCW dataset. The above call is equivelant to:

?- bims([models(carts)]).

To run a toy BN learning example run

?- bims( [models(bns)] ).

This runs 3 chains on some synthetic data of the 8-nodal Asia BN.

To get familiar on how to run bims on private data, make a new directory, create a subdirecory

`data`

and copy file`bims(data/asia.pl)`

to`data/test_local.pl`

.?- bims( [data(test_local)] ).

`Opts`- chains(R=3)
- number of chains or runs. Each chain is identified by N in 1...
`R`. - iterations(I=100)
- number of iterations per run. Strictly speaking this is iterations - 1.
That is:
`I`is the number of models in each chain produced. - models(Models=carts)
- type of the models in the chain. An alternative type of model type is
`bns`

. - debug(Dbg=true)
- If
`Dbg`==true, run`debug(bims)`

to get debuging messages. If`Dbg`==false,`nodebug(bims)`

is called. - seeds(Seeds=1)
- hash seeds for each run (1-1000), if length of
`Seeds`is less than R, additional items added consequtively from last value. So for instance,`seeds(1)`

when`chains(3)`

is given expands to`seeds([1,2,3])`

. - likelihood(Lk=Model)
- likelihood to use, default depends on
`Model`chosen (system provided models, have a nameshake default likelihood, for example*carts*likelihood is the default likelihood for carts models) - data(Data=Model)
- a term that indicates the data for the runs. The precise way of loading and calls depend on
Lk (the likelihood function) via the hook model_data_load/2, and what the prior
(see option
`top_goal(Top)`

) expects. In general the dependency is with the likelihood, with the prior expected to be compatible with what the likelihood dictates in terms of data. In the likelihoods provided,`Data`is the stem of a filename that is loaded in memory. The file is looked for in Dir/`Data`[.pl] where Dir is looked for in [./data,bims(`Model`/data/)]. - top_goal(Top=Model)
- the top goal for running the MCMC simulations. Should be the partial call corresponding to a predicate defined in Prior, as completed by adding the model as the last argument.
- prior(Prior=Model)
- a file defining the prior DLP. Each model space has a default nameshake prior.
The prior file is looked for in
*dlps*and`bims(dlps)`

. - backtrack(Backtract=uc)
- backtracking strategy (fix me: add details)
- tempered(Tempered=[])
- hot chains (fixme: add details) - this is an advanced feature undocumented for now
- results_dir(Rdir=res-Dstamp)
- results directory. If absent default is used. If present but a variable the default is
used and returned as the instantiation to this variable.
The directory should not exist prior to the call.
The default method uses a time stamp to provide uniqueness. (fixme: add
`prefix(Pfx)`

recognition) **report**`(These)`- where
`These`is a listable set of reportable tokens (should match 1st argument of known_reportable_term/2). =[all|_] or`all`

is expanded to reporting all known reportable terms. - progress_percentage(Pc=10)
- the percentage at which to report progress of all runs (>100 or non numbers for no progress reporting)
- progress_stub(Stub=('.'))
- the stub marking progress

All file name based options: Lk, Data, Prior or Rdir, are passed through absolute_file_name/2.

The predicate generates one results directory (Rdir) and files recording information about each run (R) are placed in Rdir.

**bims_version**`(-Vers, -Date)`- Version Mj:Mn:Fx, and release date
`date(Y,M,D)`

. **bims_citation**`(-Atom, -Bibterm)`- Succeeds once for each publication related to this library.
`Atom`is the atom representation suitable for printing while`Bibterm`is a`bibtex(Type,Key,Pairs)`

term of the same publication. Produces all related publications on backtracking.?-

`bims_citation( A, G )`

,`write( A )`

nl.Distributional Logic Programming for Bayesian Knowledge Representation.

Nicos Angelopoulos and James Cussens.

International Journal of Approximate Reasoning (IJAR).

Volume 80, January 2017, pages 52-66.