|
Rock, Paper or Scissor Game - Train and Classify [Volume 5] |
Tags | evaluate☁machine-learning☁features☁quality☁cross-validation |
Previous Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module
☌
In order to ensure that our classification system is functional we need to evaluate it in an objective way.
At our final volume (current Jupyter Notebook ) an evaluation methodology will be described taking into consideration a particular cross-validation technique. |
Performance Evaluation
Brief IntroA classifier should function correctly when the testing examples are very similar to the training examples. However, if there is a testing example with characteristics that are somewhat disparate, the robustness of the system will be challenged.
Thus, what makes a classifier robust is his capacity to establish correspondences even when the similarities are more tenuous. To estimate the quality of the implemented system, there were different methods to be followed, namely Cross-Layer Estimation and Leave One Out (see an external reference ).
In Cross-Layer Estimation the training set is divided into $N$ subsets with approximately the same number of examples. Using N−1 iterations, each of the $N$ subsets acquires the role of testing set, while the remaining N−1 subsets are used to train a "partial" classifier. Finally, an estimate of the error of the original classifier is obtained through the partial errors of the N−1 partial classifiers.
One particular cross-validation strategy is: k-fold Cross-Validation .
Relatively to the Leave One Out method, it is a particular case of cross-layer estimation. It involves creating a number of partial classifiers which is equal to the number of training examples. In each iteration, one training example assumes the role of testing example, while the rest are used to train the "partial" classifier.
This last methodology is particularly useful when the training set is small, taking into consideration that it will be very expensive on big training sets.
Fortunately there are built-in function on scikit-learn , which will be applied in the current Jupyter Notebook .
0 - Import of the needed packages for a correct execution of the current Jupyter Notebook
# Python package that contains functions specialised on "Machine Learning" tasks.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score, LeaveOneOut
# biosignalsnotebooks own package that supports some functionalities used on the Jupyter Notebooks.
import biosignalsnotebooks as bsnb
# Package containing a diversified set of function for statistical processing and also provide support to array operations.
from numpy import array
1 - Replicate the training procedure of Volume 3 of "Classification Game" Jupyter Notebook
1.1 - Load of all the extracted features from our training data
⚠ This step was done internally !!! For now don"t be worried about that, remember only that a dictionary (called "features_class_dict" ), with the list of all features values and classes of training examples, is available from Volume 3 of "Classification Game" Jupyter NotebookList of Dictionary keys
features_class_dict.keys()
1.2 - Storage of dictionary content into separate variables
features_list = features_class_dict["features_list_final"]
class_training_examples = features_class_dict["class_labels"]
print(len(features_list[0]))
1.3 - Let"s select two sets of features. Set A will be identical to the one used on Volume 3 of "Classification Game" Jupyter Notebook , while set B is a more restricted one, formed by three features (one from each used sensor)
Set of Features A
# Renaming original variable.
training_examples_a = features_list
Set of Features B (one random feature from each sensor, i.e, a set with 3 features)
[$\sigma_{emg\,flexor}$, $zcr_{emg\,flexor}$, $\sigma_{emg\,flexor}^{abs}$, $\sigma_{emg\,adductor}$, $\sigma_{emg\,adductor}^{abs}$, $\sigma_{acc\,z}$, $max_{acc\,z}$, $m_{acc\,z}$]
= [False, True, False, True, False, False, False, True] (List entries that contain relevant features are flagged with "True")
# Access each training example and exclude meaningless entries.
# Entries that we want to keep are marked with "True" flag.
acception_labels_b = [False, True, False, True, False, False, False, True]
training_examples_b = []
for example_nbr in range(0, len(features_list)):
training_examples_b += [list(array(features_list[example_nbr])[array(acception_labels_b)])]
1.4 - Two classifiers will be trained, using the features contained inside the two previous sets of features
Set of Features A
# k-Nearest Neighbour object initialisation.
knn_classifier_a = KNeighborsClassifier()
# Fit model to data.
knn_classifier_a.fit(training_examples_a, class_training_examples)
Set of Features B
# k-Nearest Neighbour object initialisation.
knn_classifier_b = KNeighborsClassifier()
# Fit model to data.
knn_classifier_b.fit(training_examples_b, class_training_examples)
2 - Usage of "cross_val_score" function of scikit-learn package
With this function it will be possible to specify a cross-validation method in order to the performance of our classification system can be accessed. In the current Jupyter Notebook it will be used one of the previously described cross-validation methods:2.1 - Classifier trained with Set of Features A
leave_one_out_score_a = cross_val_score(knn_classifier_a, training_examples_a, class_training_examples, scoring="accuracy", cv=LeaveOneOut())
# Average accuracy of classifier.
mean_l1o_score_a = leave_one_out_score_a.mean()
# Standard Deviation of the previous estimate.
std_l1o_score_a = leave_one_out_score_a.std()
2.2 - Classifier trained with Set of Features B
leave_one_out_score_b = cross_val_score(knn_classifier_b, training_examples_b, class_training_examples,
scoring="accuracy", cv=LeaveOneOut())
# Average accuracy of classifier.
mean_l1o_score_b = leave_one_out_score_b.mean()
# Standard Deviation of the previous estimate.
std_l1o_score_b = leave_one_out_score_b.std()
As you can see, different sets of features produced two classifiers with a very distinct performance. We clearly understand that the first set of features Set A ensures a more effective training stage and consequently prepares better the classifier to receive and classify correctly new training examples !
We reach the end of the "Classification Game". This 5-Volume long journey reveals the wonderful world of Machine Learning , however the contents included in the Notebooks represent only a small sample of the full potential of this research area.
We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !