What are different methods in Semi-supervised learning?¶

Semi-supervised Learning refers to type of machine learning algorithm where limited set of known input-output is utilized (as opposed to Supervised Learning where sufficient amount of input-output is present) to predict the output for other input.

Semi-supervised learning often makes an assumption about the structure of the underlying data to make use of the limited known labeled data. This could be a continuity assumption (close inputs corresponding to close outputs), cluster assumption (inputs tend to form clusters with similar output), or manifold assumption (inputs fall around a manifold with a lower dimension than inputs).

Semi-supervised approaches can be broadly grouped into the following categories:

Generative Methods

In generative methods, a distribution is assumed for the inputs belonging to various outputs and the problem is formed as search for the parameters of the distribution.

Low-density Separation Methods

Low-density separation makes an effort to build boundaries between known labels and label remaining inputs such that the boundary has maximal margin over all of the inputs.

Graph-based Methods

Graph-based methods leverage on graph representation of data to find a lower dimension manifold fitting the data.

Heuristic Methods

Heuristic Methods encompass a wide range of approaches including Self-training (where supervised training on labeled data is used to generate more labels for the supervised training) or Co-training (where multiple classifiers are trained on different sets of labeled data and used to generate more labeled data for other sets)