Machine-learning methods - Documentation¶
Supervised methods¶
-
class
supervised.kernelKNN(k)[source]¶ K-nearest neighbor instance, allowing for any kernel
-
train(data, labels, **kwargs)[source]¶ Trains the classifier based on @data, @labels and a @kernel_fct.
Parameters: - data (N*p numpy array) –
- labels (N*1 numpy array) –
- kernel_fct – optionnal, a method used to compute a kernel matrix from input data
- solver – optionnal, a numerical solver adapted to the task at hand
- stringsData (boolean) – indicating if we are dealing with strings
- kwargs – additional keyword arguments, for instance that should be provided to the solver or the kernel function
-
-
class
supervised.kernelLogisticRegression(lbda=0.1)[source]¶ Logistic regression instance, allowing for any kernel
Unsupervised methods¶
-
class
unsupervised.GaussianMixture(K, d, pi, mu, sigma, isotropic=True)[source]¶ Instance for fitting gaussian mixtures to a dataset
-
K¶ int – number of clusters
-
d¶ int – dimension
-
isotropic¶ boolean – true forces clusters to be spherical. Defaults to true
-
pi¶ np.array, Kx1 – current estimate of the class variable probabilities. Initialized at train()
-
mu¶ np.array, Kxd – current estimate of clusters first order momentum. Initialized at train()
-
sigma¶ np.array, KxK – current estimate of covaraince matrix. Initialized at train()
-
n¶ int – number of data points for linked dataset
-
draw(data, predictions, size=40, scale=0.8, eps=0.1)[source]¶ Prints data points, centroids and alineates the covariances matrices
Parameters: Todo
1, high : Refactorize scale parameters
-
predict(X)[source]¶ Predicts cluster assignments from a dataset
Parameters: X (np.array) – nxd input dataset Returns: predictions – cluster assignment, one per data point Return type: np.array
-
printResults(log_likelihoods)[source]¶ Print the learnt parameters after training and the evolution of the partial log likelihood through time
Todo
This does not fit well into the package philosophy. To be refactorized
-
-
class
unsupervised.Kmeans(data, nClass, ind=0, init='def')[source]¶ Standard K-means method
-
ind¶ int – instance ID
-
data¶ np.array – the data bound to this instance.
-
N¶ int – number of records
-
d¶ int – dimension
-
K¶ int – number of clusters
-
centroids¶ np.array – current centroids. Only initialized at run
-
assignment¶ np.array – maps every data point to a cluster ID
Todo
1, Low: Generalize this class to a kernel. Have the kernel bound to the instance from the initialization
2, Low: Allow for the data to be reset
-