-- Machine learning utilities

A menagerie of machine learning utilities.

Currently implements k_fold learning and k_fold comparative performance plots via Real.

It is likely that bootstrapping will be added soon and also a couple of additional types of comparative plots.

Pack info

- nicos angelopoulos
- 0.1 2016/3/5
- 0.2 2017/3/11
- 0.3 2021/12/31
- 0.4 2022/1/2
- 0.5 2022/12/29
See also
- pack(mlu/examples/
 k_fold_pairwise_comparisons(+Data, +Learners, +Predictors, -Models, -Statistics, +Opts)
Compare M learners, Goals, over N cross sections of Data. On each of the N iterations, the learners are ran on the N-1 sections and tested on the hold out Nth section. If a single Predictor is given, (singleton list or unlisted), then it is used on all Learerns. Else number should be equal to number of lerners. There should be at least two Learners, else you better call k_fold_learn directly (see Predictor and Statistic options there).


used to distinguish the models for each learner, an l prefixed atom is used by default
Names for Models. Unlike other non post() options, who are passed on to Post calls via options, this is passed to explicitly. This not used in main call, and it defaults to model_01...model_NN If a variable, the generated names are returned.
Post processing after Models + Statistics are constructed Known Post values:
names for the statistics

Additional Opts for the Post are allowed. For each Post, all options with matching outer term name are stripped and passed. For instance jitter(accuracy(2,'AUC')) is passed to Post jitter, and it signifies that the predictor passes as it is second argument the AUC of the model against the leave out segment. The implementation is expected correct at the predictor end, here we just provide means to pass the information to the plotter.

Also see options for k_fold_learn/4.


accuracy(N, Name)
the Nth (>1) position of the statistic is accuracy identified by Name
predicate name for obtaining the accuracy names, called as call(Pname,N,Name)
R function for obtaining the single accuracy from the k_fold accuracies
set to false to avoid re-runing ground models. Convenient for running comparatives
    ?- [pack(mlu/examples/stoic)].

    ?- stoic.  % use ?- stoic_ng.  % if you do not have library(real).
 k_fold_pairwise_predictions(+Dat, +Learners, +Predictor, -Models, -Stats, +Opts)
Run k_fold_pair_predictions/7 on pairs of Learners on a single k_fold segmentation. By default all pairwise comparisons are considered.


list or single pair (L1-L2) of predictions to consider
See also
- options for k_fold_pair_predictions/7
To be done
- allow distinct Predictors
 k_fold_segments(+Data, -Header, -Segments, +Opts)
Split data to N segments. Header is the header of Data or a made up one if Data does not have a header.
?- Data = [r,a,b,c,d,e,f], k_fold_segments( Data, H, Sgs, folds(3) ).
Data = [r, a, b, c, d, e, f], H = r,
Sgs = [[a, d], [b, c], [e, f]].

?- Data = [r,a,b,c,d,e,f], k_fold_segments( Data, H, Sgs, [folds(3),by_permutation(false)] ).
Data = [r, a, b, c, d, e, f], H = r,
Sgs = [[b, e], [c, d], [a, f]].


whether to create segments by a single permutation operation. Althernative is by sequentially chossing bins for each datum until all bins are full (and Data is empty).
number of segments to split the data to (exhaustive and mutual exclusive splits)
- nicos angelopoulos
- 0.1 2016/11/08
See also
- k_fold_learn/4
Documentation predicate.

Pack mlu uses pack pack_errors for throwing errors.

File defines local errors within the pack_errors infrastructure.

 mlu_sample(+Goal, +Times, -Yield, -Pairs)
 mlu_sample(+Goal, +Times, -Yield, -Pairs, +Opts)
Run Goal Times number of times, at each run observing Yield. The results are the Yield-Count paired list Pairs.

Currently the predicate: copy_terms Goal and Yield, and requires that Yield's copy will be ground after Goal's copy is called.


if integer(R), the V in Pairs is a list of results (length(V) = R)
?- lib(pepl).
?- sload_pe( coin ).
?- mlu_sample( scall(coin(Side)), 100, Side, Freqs ).
Freqs = [head-47, tail-53].

?- mlu_sample( scall(coin(Side)), 100, Side, Freqs ).
Freqs = [head-49, tail-51].
- nicos angelopoulos
- 0.1 2016/8/31
 mlu_frequency_plot(+FreqOrVec, +Opts)
Make a plot for Data, a pairlist, list or R vector.

Data is one of

of the form, Item-Times
that is passed, with Opts, to list_frequency/3
the values of which are retrieved with pl_vector/3, and then \br passed, with Opts, to list_frequency/3


barplot, or gg_bar interfaces are supported. The first requires lib(real), in addition the latter also requires lib(b_real). If b_real is present, the second interface becomes the default.
colour population groups according to given break points. Splits are done with =< so break points go to the left partition. (Currently only for gg_bar interface.)
if integer draws a vertical line separating columns with counts less than PlineAt to those with more. Only makes sense if Sort is set to frequency. (Currently only for gg_bar interface.)
alternatives to not sorting (default)
sort by element
sort on frequency

Other options are passed to either gg_bar_plot/2 (if Iface == gg_bar) or to r_call/2 (if Iface == barplot).

?- lib(pepl).
?- sload_pe( coin ).

?- [pack(mlu/examples/grouped_freqs)].
?- grouped_freqs.
% a plot with 9 bars and 3 groups should appear on screen

?- mlu_frequency_plot( [1,1,1,2,2,3], true ).
?- mlu_frequency_plot( [1,1,1,2,2,3], interface(barplot) ).
?- mlu_frequency_plot( [1,1,2,11,12,21,31,33,41], [bins([10,20]),interface(gg_bar)] ).

The plot produced has binned Data into 3 bins.

?- mlu_frequency_plot( [1,1,2,11,12,21,31,33,41], [bins([bin1-10,bin2-20,bin3-inf]),interface(gg_bar)] ).

As previous example, but x tics are custom labelled.

?- mlu_frequency_plot( [1,2,10,11,12,21,31,33,41], [bins([0-10]),interface(gg_bar)] ).
?- lib(pepl).
?- sload_pe( coin ).
?- mlu_sample( scall(coin(Side)), 100, Side, Freqs ), mlu_frequency_plot( Freqs, [interface(barplot),outputs([svg]),las=2] ).

Produces file: real_plot.svg

?- mlu_sample( scall(coin(Side)), 100, Side, Freqs ), mlu_frequency_plot( Freqs, [interface(gg_bar),output(png("naku.png"))] ).

Produces file: naku.png


- nicos angelopoulos
- 0.1 2016/8/31
- 0.2 2017/1/13, added option sort(false)
- 0.3 2017/8/29, added vectors as inputs via pl_vector/3
 mlu_version(-Version, -Date)
Current version and release date for the library.
?- mlu_version( V, D ).
V = 0:5:0,
D = date(2022, 12, 29).