|
Rock, Paper or Scissor Game - Train and Classify [Volume 4] |
Tags | train_and_classify☁machine-learning☁features☁train☁nearest-neighbour |
Previous Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module
Following Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module
☌
After the previous three volumes of the
Jupyter Notebook
dedicated to our "Classification Game", we are reaching a decisive stage:
Training of Classifier
.
Currently, as demonstrated in the previous volume , all the training data (examples and respective features) are ready to be applied to a classification algorithm. The choice of classification algorithm resulted in the selection of k-Nearest Neighbour classifier. On current Jupyter Notebook it will be described relevant steps to achieve our goal of training a k-Nearest Neighbour classifier. |
Starting Point (Setup)
List of Available Classes:
Paper | Rock | Scissor |
Acquired Data:
Protocol/Feature Extraction
Extracted Features
Formal definition of parameters
☝ | Maximum Sample Value of a set of elements is equal to the last element of the sorted set
☉ | $\mu = \frac{1}{N}\sum_{i=1}^N (sample_i)$
☆ | $\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N(sample_i - \mu_{signal})^2}$
☌ | $zcr = \frac{1}{N - 1}\sum_{i=1}^{N-1}bin(i)$
☇ | $\sigma_{abs} = \sqrt{\frac{1}{N}\sum_{i=1}^N(|sample_i| - \mu_{signal_{abs}})^2}$
☍ | $m = \frac{\Delta signal}{\Delta t}$
... being $N$ the number of acquired samples (that are part of the signal), $sample_i$ the value of the sample number $i$, $signal_{abs}$ the absolute signal, $\Delta signal$ is the difference between the y coordinate of two points of the regression curve and $\Delta t$ the difference between the x (time) coordinate of the same two points of the regression curve.
... and
$bin(i)$ a binary function defined as:
$bin(i) = \begin{cases} 1, & \mbox{if } signal_i \times signal_{i-1} \leq 0 \\ 0, & \mbox{if } signal_i \times signal_{i-1}>0 \end{cases}$
k-Nearest Neighbour Classifier
Brief Intro
0 - Import of the needed packages for a correct execution of the current Jupyter Notebook
# Python package that contains functions specialized on "Machine Learning" tasks.
from sklearn.preprocessing import normalize
from sklearn.neighbors import KNeighborsClassifier
# biosignalsnotebooks own package that supports some functionalities used on the Jupyter Notebooks.
import biosignalsnotebooks as bsnb
# Package containing a diversified set of function for statistical processing and also provide support to array operations.
from numpy import array
⚠ This step was done internally !!! For now don"t be worried about that.
1 - Loading of the dictionary created on Volume 3 of "Classification Game" Jupyter Notebook , containing the selected features
This dictionary is formed by two levels:Selecting a good set of features is a really important stage for training an effective classification system. For now we are simply loading the selected set of features without explaining the real reason for choosing them (we strongly recommend to read the content of Rock, Paper or Scissor Game - Train and Classify [Volume 3] | Feature Selection ).
In order to understand the relevance of selecting a valuable set of features (and how this choice can affect the performance of our classifier), our last volume of "Classification Game" ( Rock, Paper or Scissor Game - Train and Classify [Volume 5] | Performance Evaluation ) can be a useful resource to go deeper into this question !
Set of Features A
2 - Storage of content of the dictionary into individual variables
In the previously mentioned internal step (of loading the dictionary created on Rock, Paper or Scissor Game - Train and Classify [Volume 3] | Feature Selection ) data was stored in features_dict variable.training_examples = features_dict["features_list_final"]
class_training_examples = features_dict["class_labels"]
Checkpoint !!! Currently all the information needed for training our classifier is stored on the following variables:
3 - Creation of a "k-Nearest Neighbour" scikit-learn objects
We use the predefined $k$ (number of neighbours) which is 5.# k-Nearest Neighbour object initialisation.
knn_classifier = KNeighborsClassifier()
4 - Begin the training stage of classifier (fitting model to data)
knn_classifier.fit(training_examples, class_training_examples)
The following interactive plot ensures a deep understanding about the class separation provided by each pair of dimensions/features.
5 - For classifying a new "test" example (with unknown class) it will only be necessary to give an input to the classifier, i.e., a list with the features values of the "test" example
# A list with 8 arbitrary entries.
test_examples_features = [0.65, 0.51, 0.70, 0.10, 0.20, 0.17, 0.23, 0.88]
# Classification.
print("Returned Class: ")
print(knn_classifier.predict([test_examples_features]))
# Probability of Accuracy.
print("Probability of each class:")
print(knn_classifier.predict_proba([test_examples_features]))
There is a clear doubt between class "0" ("No Action") and class "1" ("Paper"), with 40 % and 60 % of accuracy probability, respectively.
With the steps described on the current volume of "Classification Game", our classifier is trained and ready to receive new examples and classify them immediately.
There is only one remaining task, that will be briefly explained on the final volume , consisting in the objective evaluation of the classifier quality.
We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !