# Probabilistic Cross Categorization

Lace is built on a Bayesian probabilistic model called *Probabilistic
Cross-Categorization* (PCC). PPC groups \(m\) columns into \(1, ..., m\)
*views*, and within each view, groups the \(n\) rows into \(1, ..., n\)
*categories*. PCC uses a non-parametric prior process (the Dirichlet
process) to learn the number
of view and categories. Each column (feature) is then modeled as a mixture
distribution defined by the
category partition. For example, a continuous-valued column will be modeled as
a mixture of Gaussian distributions. For references on PCC, see the
appendix.

## Differences between PCC and Traditional ML

### Inputs and outputs

Regression and classification are defined in terms of learning a funciton \(f(x) \rightarrow y \) that maps inputs, \(x\), to outputs, \(y\). PCC has no notion of inputs and outputs. There is only data. PCC learns a joint distribution \(p(x_1, x_2, ..., x_m)\) from which the user can create condition distributions. To predict \(x_1\) given \(x_2\) and \(x_3\), you find \(\text{argmax}_{x_1} p(x_1|x_2, x_3)\).

### Supported data types

Most ML models are designed to handle one type of input data, generally
continuous. This means if you have categorical data, you have to transform it:
you can convert it to a float (e.g. `float(x)`

in python) and just
sweep the categorical-ness of the data under the rug, you can do something like
one-hot encoding, which significantly
increases dimensionality, or you can use some kind of embedding like in
natural language processing,
which destroys
interpretability. PCC allows your data to stay as they are.

### The learning method

Most machine learning models use an optimization algorithm to find a set of parameters that achieves a local minima in the loss function. For example, Deep Neural Networks may use stochastic gradient descent to minimize cross entropy. This results in one parameter set representing one model.

In Lace, we use Markov Chain Monte Carlo to do posterior sampling. That is, we attempt to draw a number of PCC states from the posterior distribution. These states provide a kind of topographical map of the PCC posterior distribution which we can use to do a number of things including computing likelihoods and uncertainties.