Conduct an analysis

You've made a codebook, you've fit a model, now you're ready to do learn.

Let's use the built-in examples to walk through some key concepts. The Animals example isn't the biggest, or most complex, and that's exactly why it's so great. People have acquired a ton of intuition about animals like how and why you might categorize animals into a taxonomy, and why animals have certain features and what that might tell us about other features of animals. This means, that we can see if lace recovers our intuition.

from lace import examples

# if this is your first time using the example, lace must
# build the metadata
animals = examples.Animals()
use lace::examples::Example;
use lace::prelude::*;

// You can create an Engine or an Oracle. An Oracle is
// basically an immutable Engine. You cannot add/edit data or
// extend runs (update).
let animals = Example::Animals.engine().unwrap();

Statistical structure

Usually, the first question we want to ask of a new dataset is "What questions can I answer?" This is a question about statistical dependence. Which features of our dataset share statistical dependence with which others? This is closely linked with the question "which things can I predict given which other things?"

In python, we can generate a plotly heatmap of dependence probability.

animals.clustermap(
    'depprob',
    color_continuous_scale='greys',
    zmin=0,
    zmax=1
).figure.show()

In rust, we ask about dependence probabilities between individual pairs of features

let depprob_flippers = animals.depprob(
    "swims",
    "flippers",
).unwrap();

Prediction

Now that we know which columns are predictive of each other, let's do some predicting. We'll predict whether an animal swims. Just an animals. Not an animals with flippers, or a tail. Any animal.

animals.predict("swims")
animals.predict(
    "swims",
    &Given::<usize>::Nothing,
    true,
    None,
);

Which outputs

(0, 0.04384630488890182)

The first number is the prediction. Lace predicts that an animal does not swims (because most of the animals in the dataset do not swim). The second number is the uncertainty. Uncertainty is a number between 0 and 1 representing the disagreement between states. Uncertainty is 0 if all the states completely agree on how to model the prediction, and is 1 if all the states completely disagree. Note that uncertainty is not tied to variance.

The uncertainty of this prediction is very low.

We can add conditions. Let's predict whether an animal swims given that it has flippers.

animals.predict("swims", given={'flippers': 1})
animals.predict(
    "swims",
    &Given::Conditions(vec![
        ("flippers", Datum::Categorical(lace::Category::U8(1)))
    ]),
    true,
    None,
);

Output:

(1, 0.09588592928237495)

The uncertainty is a little higher, but still quite low.

Let's add some more conditions that are indicative of a swimming animal and see how that effects the uncertainty.

animals.predict("swims", given={'flippers': 1, 'water': 1})
animals.predict(
    "swims",
    &Given::Conditions(vec![
        ("flippers", Datum::Categorical(lace::Category::U8(1))),
        ("water", Datum::Categorical(lace::Category::U8(1))),
    ]),
    true,
    None,
);

Output:

(1, 0.06761776764962134)

The uncertainty is a bit lower now that we've added swim-consistent evidence.

How about we try to mess with Lace? Let's try to confuse it by asking it to predict whether an animal with flippers that does not go in the water swims.

animals.predict("swims", given={'flippers': 1, 'water': 0})
animals.predict(
    "swims",
    &Given::Conditions(vec![
        ("flippers", Datum::Categorical(lace::Category::U8(1))),
        ("water", Datum::Categorical(lace::Category::U8(0))),
    ]),
    true,
    None,
);

Output:

(0, 0.36077426258767503)

The uncertainty is really high! We've successfully confused lace.

Evaluating likelihoods

Let's compute the likelihood to see what is going on

import polars as pl

animals.logp(
    pl.Series("swims", [0, 1]),
    given={'flippers': 1, 'water': 0}
).exp()
animals.logp(
    &["swims"],
    &[
        vec![Datum::Categorical(lace::Category::U8(0))],
        vec![Datum::Categorical(lace::Category::U8(1))],
    ],
    &Given::Conditions(vec![
        ("flippers", Datum::Categorical(lace::Category::U8(1))),
        ("water", Datum::Categorical(lace::Category::U8(0))),
    ]),
    None,
)
.unwrap()
.iter()
.map(|&logp| logp.exp())
.collect::<Vec<_>>();

Output:

# polars
shape: (2,)
Series: 'logp' [f64]
[
	0.589939
	0.410061
]

Anomaly detection

animals.surprisal("fierce")\
    .sort("surprisal", descending=True)\
    .head(10)

Output:

# polars
shape: (10, 3)
┌──────────────┬────────┬───────────┐
│ index        ┆ fierce ┆ surprisal │
│ ---          ┆ ---    ┆ ---       │
│ str          ┆ u32    ┆ f64       │
╞══════════════╪════════╪═══════════╡
│ pig          ┆ 1      ┆ 1.565845  │
│ rhinoceros   ┆ 1      ┆ 1.094639  │
│ buffalo      ┆ 1      ┆ 1.094639  │
│ chihuahua    ┆ 1      ┆ 0.802085  │
│ ...          ┆ ...    ┆ ...       │
│ collie       ┆ 0      ┆ 0.594919  │
│ otter        ┆ 0      ┆ 0.386639  │
│ hippopotamus ┆ 0      ┆ 0.328759  │
│ persian+cat  ┆ 0      ┆ 0.322771  │
└──────────────┴────────┴───────────┘