Rock, Paper or Scissor Game - Train and Classify [Volume 5]
Difficulty Level:
Tags evaluate☁machine-learning☁features☁quality☁cross-validation

Previous Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module

In order to ensure that our classification system is functional we need to evaluate it in an objective way.
At our final volume (current Jupyter Notebook ) an evaluation methodology will be described taking into consideration a particular cross-validation technique.


Performance Evaluation

Brief Intro
When implementing a classification system it is considered to be extremely important to have an objective understanding of how said system would behave when interacting with new testing examples.

A classifier should function correctly when the testing examples are very similar to the training examples. However, if there is a testing example with characteristics that are somewhat disparate, the robustness of the system will be challenged.

Thus, what makes a classifier robust is his capacity to establish correspondences even when the similarities are more tenuous. To estimate the quality of the implemented system, there were different methods to be followed, namely Cross-Layer Estimation and Leave One Out (see an external reference ).

In Cross-Layer Estimation the training set is divided into $N$ subsets with approximately the same number of examples. Using N−1 iterations, each of the $N$ subsets acquires the role of testing set, while the remaining N−1 subsets are used to train a "partial" classifier. Finally, an estimate of the error of the original classifier is obtained through the partial errors of the N−1 partial classifiers.

One particular cross-validation strategy is: k-fold Cross-Validation .

Relatively to the Leave One Out method, it is a particular case of cross-layer estimation. It involves creating a number of partial classifiers which is equal to the number of training examples. In each iteration, one training example assumes the role of testing example, while the rest are used to train the "partial" classifier.

This last methodology is particularly useful when the training set is small, taking into consideration that it will be very expensive on big training sets.

Fortunately there are built-in function on scikit-learn , which will be applied in the current Jupyter Notebook .

0 - Import of the needed packages for a correct execution of the current Jupyter Notebook

In [1]:
# Python package that contains functions specialised on "Machine Learning" tasks.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score, LeaveOneOut

# biosignalsnotebooks own package that supports some functionalities used on the Jupyter Notebooks.
import biosignalsnotebooks as bsnb

# Package containing a diversified set of function for statistical processing and also provide support to array operations.
from numpy import array

1 - Replicate the training procedure of Volume 3 of "Classification Game" Jupyter Notebook

1.1 - Load of all the extracted features from our training data

This step was done internally !!! For now don"t be worried about that, remember only that a dictionary (called "features_class_dict" ), with the list of all features values and classes of training examples, is available from Volume 3 of "Classification Game" Jupyter Notebook

In [2]:
# Package dedicated to the manipulation of json files.
from json import loads

# Specification of filename and relative path.
relative_path = "/signal_samples/classification_game/features"
filename = "classification_game_features_final.json"

# Load of data inside file, storing it inside a Python dictionary.
with open(relative_path + "/" + filename) as file:
    features_class_dict = loads(file.read())

List of Dictionary keys

In [3]:
features_class_dict.keys()
Out[3]:
dict_keys(['features_list_final', 'class_labels'])

1.2 - Storage of dictionary content into separate variables

In [4]:
features_list = features_class_dict["features_list_final"]
class_training_examples = features_class_dict["class_labels"]
print(len(features_list[0]))
8

1.3 - Let"s select two sets of features. Set A will be identical to the one used on Volume 3 of "Classification Game" Jupyter Notebook , while set B is a more restricted one, formed by three features (one from each used sensor)

Set of Features A

  • $\sigma_{emg\,flexor}$
  • $zcr_{emg\,flexor}$
  • $\sigma_{emg\,flexor}^{abs}$
  • $\sigma_{emg\,adductor}$
  • $\sigma_{emg\,adductor}^{abs}$
  • $\sigma_{acc\,z}$
  • $max_{acc\,z}$
  • $m_{acc\,z}$
In [5]:
# Renaming original variable.
training_examples_a = features_list

Set of Features B (one random feature from each sensor, i.e, a set with 3 features)

  • $zcr_{emg\,flexor}$
  • $\sigma_{emg\,adductor}$
  • $m_{acc\,z}$

[$\sigma_{emg\,flexor}$, $zcr_{emg\,flexor}$, $\sigma_{emg\,flexor}^{abs}$, $\sigma_{emg\,adductor}$, $\sigma_{emg\,adductor}^{abs}$, $\sigma_{acc\,z}$, $max_{acc\,z}$, $m_{acc\,z}$]

= [False, True, False, True, False, False, False, True] (List entries that contain relevant features are flagged with "True")

In [6]:
# Access each training example and exclude meaningless entries.
# Entries that we want to keep are marked with "True" flag.
acception_labels_b = [False, True, False, True, False, False, False, True]
training_examples_b = []
for example_nbr in range(0, len(features_list)):
    training_examples_b += [list(array(features_list[example_nbr])[array(acception_labels_b)])]

1.4 - Two classifiers will be trained, using the features contained inside the two previous sets of features

Set of Features A

In [7]:
# k-Nearest Neighbour object initialisation.
knn_classifier_a = KNeighborsClassifier()

# Fit model to data.
knn_classifier_a.fit(training_examples_a, class_training_examples) 
Out[7]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

Set of Features B

In [8]:
# k-Nearest Neighbour object initialisation.
knn_classifier_b = KNeighborsClassifier()

# Fit model to data.
knn_classifier_b.fit(training_examples_b, class_training_examples) 
Out[8]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

2 - Usage of "cross_val_score" function of scikit-learn package

With this function it will be possible to specify a cross-validation method in order to the performance of our classification system can be accessed. In the current Jupyter Notebook it will be used one of the previously described cross-validation methods:

  • Leave One Out

2.1 - Classifier trained with Set of Features A

In [9]:
leave_one_out_score_a = cross_val_score(knn_classifier_a, training_examples_a, class_training_examples, scoring="accuracy", cv=LeaveOneOut())

# Average accuracy of classifier.
mean_l1o_score_a = leave_one_out_score_a.mean()

# Standard Deviation of the previous estimate.
std_l1o_score_a = leave_one_out_score_a.std()
In [10]:
from sty import fg, rs
print(fg(232,77,14) + "\033[1mAverage Accuracy of Classifier:\033[0m" + fg.rs)
print(str(mean_l1o_score_a * 100) + " %")

print(fg(98,195,238) + "\033[1mStandard Deviation:\033[0m" + fg.rs)
print("+-" + str(round(std_l1o_score_a, 1) * 100) + " %")
Average Accuracy of Classifier:
90.0 %
Standard Deviation:
+-30.0 %

2.2 - Classifier trained with Set of Features B

In [11]:
leave_one_out_score_b = cross_val_score(knn_classifier_b, training_examples_b, class_training_examples, 
                                        scoring="accuracy", cv=LeaveOneOut())

# Average accuracy of classifier.
mean_l1o_score_b = leave_one_out_score_b.mean()

# Standard Deviation of the previous estimate.
std_l1o_score_b = leave_one_out_score_b.std()
In [12]:
from sty import fg, rs
print(fg(232,77,14) + "\033[1mAverage Accuracy of Classifier:\033[0m" + fg.rs)
print(str(mean_l1o_score_b * 100) + " %")

print(fg(98,195,238) + "\033[1mStandard Deviation:\033[0m" + fg.rs)
print("+-" + str(round(std_l1o_score_b, 1) * 100) + " %")
Average Accuracy of Classifier:
60.0 %
Standard Deviation:
+-50.0 %

As you can see, different sets of features produced two classifiers with a very distinct performance. We clearly understand that the first set of features Set A ensures a more effective training stage and consequently prepares better the classifier to receive and classify correctly new training examples !

We reach the end of the "Classification Game". This 5-Volume long journey reveals the wonderful world of Machine Learning , however the contents included in the Notebooks represent only a small sample of the full potential of this research area.

We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !

In [13]:
from biosignalsnotebooks.__notebook_support__ import css_style_apply
css_style_apply()
.................... CSS Style Applied to Jupyter Notebook .........................
Out[13]: