Stone, Paper or Scissor Game - Train and Classify [Volume 3]
Difficulty Level:
Tags train_and_classify☁machine-learning☁features☁selection

Previous Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module

Following Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module

Currently we are in possession of a file containing the feature values for all training examples, as demonstrated on a previously created Jupyter Notebook .
However, there is a high risk that some of the extracted features are not useful for our classification system. Remember, a good feature is a parameter that has the ability to separate the different classes of our classification system, i.e, a parameter with a characteristic range of values for each available class.
In order to ensure that the training process of our classifier happens in the most efficient way, these redundant or invariant features should be removed.
The implicit logic of the last two paragraphs is called Feature Selection , which will be focused at this Jupyter Notebook !

Starting Point (Setup)

List of Available Classes:

  1. "No Action" [When the hand is relaxed]
  2. "Paper" [All fingers are extended]
  3. "Stone" [All fingers are bent]
  4. "Scissor" [Forefinger and middle finger are extended and the remaining ones are bent]

Paper Stone Scissor

Acquired Data:

  • Electromyography (EMG) | 2 muscles | Adductor pollicis and Flexor digitorum superficialis
  • Accelerometer (ACC) | 1 axis | Sensor parallel to the thumb nail (Axis perpendicular)

Protocol/Feature Extraction

Extracted Features

Formal definition of parameters
☝ | Maximum Sample Value of a set of elements is equal to the last element of the sorted set

☉ | $\mu = \frac{1}{N}\sum_{i=1}^N (sample_i)$

☆ | $\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N(sample_i - \mu_{signal})^2}$

☌ | $zcr = \frac{1}{N - 1}\sum_{i=1}^{N-1}bin(i)$

☇ | $\sigma_{abs} = \sqrt{\frac{1}{N}\sum_{i=1}^N(|sample_i| - \mu_{signal_{abs}})^2}$

☍ | $m = \frac{\Delta signal}{\Delta t}$

... being $N$ the number of acquired samples (that are part of the signal), $sample_i$ the value of the sample number $i$, $signal_{abs}$ the absolute signal, $\Delta signal$ is the difference between the y coordinate of two points of the regression curve and $\Delta t$ the difference between the x (time) coordinate of the same two points of the regression curve.

... and

$bin(i)$ a binary function defined as:

$bin(i) = \begin{cases} 1, & \mbox{if } signal_i \times signal_{i-1} \leq 0 \\ 0, & \mbox{if } signal_i \times signal_{i-1}>0 \end{cases}$


Feature Selection

Intro
With Feature Selection we will start to use the resources contained inside an extremely useful Python package: scikit-learn

Like described before, Feature Selection is intended to remove redundant or meaningless parameters which would increase the complexity of the classifier and not always translate into an improved performance. Without this step, the risk of overfitting to the training examples increases, making the classifier less able to categorize a new testing example.

There are different approaches to feature selection such as filter methods or wrapper methods .

In the first method ( filter methods ), a ranking will be attributed to the features, using, for example, the Pearson correlation coefficient to evaluate the impact that the feature under analysis has on the target class of the training example, or the Mutual Information parameter which defines whether two variables convey shared information.

The least relevant features will be excluded and the classifier will be trained later (for a deeper explanation, please, visit the article of Girish Chandrashekar and Ferat Sahin at ScienceDirect ).

The second methodology ( wrapper methods ) is characterised by the fact that the selection phase includes a classification algorithm, and features will be excluded or selected according to the quality of the trained classifier.

There are also a third major methodology applicable on Feature Selection , including the so called embedded methods . Essentially this methods are a combination of filter and wrapper , being characterised by the simultaneous execution of Feature Selection and Training stages.

One of the most intuitive Feature Selection methods is Recursive Feature Elimination , which will be used in the current Jupyter Notebook .

Essentially the steps of this method consists in:

  1. Original set of training examples is segmented into multiple ($K$) subsets of training examples and test examples
  2. For each one of the $K$ subsets of training/test examples:
    1. The training examples are used for training a "virtual" classifier (for example a Support Vector Machine )
    2. The test examples are given as inputs of the trained classifier and the "virtual" classifier quality is estimated
  3. At this point we can estimate the average quality of the $K$ "virtual" classifiers and know the weight of each feature on the training stage
  4. The feature with a smaller weight is excluded
  5. Repetition of steps 1 , 2 and 3 until only one feature remains
  6. Finally, when the "feature elimination" procedure ends, the set of features that provide a "virtual" classifier with the best average quality (step 2 ) define the relevant features to be used during our final training stage

0 - Import of the needed packages for a correct execution of the current Jupyter Notebook

In [1]:
# Python package that contains functions specialized on "Machine Learning" tasks.
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV, RFE
from sklearn.preprocessing import normalize

# Package dedicated to the manipulation of json files.
from json import loads, dump

# Package containing a diversified set of function for statistical processing and also provide support to array operations.
from numpy import max, array

# biosignalsnotebooks own package that supports some functionalities used on the Jupyter Notebooks.
import biosignalsnotebooks as bsnb

1 - Loading of the dictionary created on Volume 2 of "Classification Game" Jupyter Notebook

This dictionary contains all the features extracted from our training examples.

In [2]:
# Specification of filename and relative path.
relative_path = "/signal_samples/classification_game/features"
filename = "classification_game_features.json"

# Load of data inside file storing it inside a Python dictionary.
with open(relative_path + "/" + filename) as file:
    features_dict = loads(file.read())
In [3]:
from sty import fg, rs
print(fg(98,195,238) + "\033[1mDict Keys\033[0m" + fg.rs + " define the class number")
print(fg(232,77,14) + "\033[1mDict Sub-Keys\033[0m" + fg.rs + " define the trial number\n")
print(features_dict)
Dict Keys define the class number
Dict Sub-Keys define the trial number

{'0': {'1': [0.002128164580188196, 0.00732421875, 0.3148858143023837, 0.0013299640761190862, 0.00525897736063944, 0.0177154541015625, 0.14585764294049008, 0.0032250390995378314, 0.5878418, 0.004769659606303164, 0.6044, 0.0, 1.4062325168397938e-06], '2': [0.002029433963100043, 0.0075531005859375, 0.3459899981478051, 0.0012865379359157589, 0.00426341220793342, 0.0205078125, 0.24356362289312836, 0.0032271742477853944, 0.5960790740740741, 0.005347679929104084, 0.6175999999999999, 0.0, 1.7843743867526657e-06], '3': [0.004812456585924175, 0.01629638671875, 0.1500312565117733, 0.0027146265743094667, 0.002620978804585002, 0.01263427734375, 0.17816211710773078, 0.0022142130046097727, 0.9737463333333332, 0.008456826821502778, 1.0055999999999998, 0.0, 4.284292720672414e-07], '4': [0.003288393293733703, 0.0120849609375, 0.2182839094577996, 0.001892522517093399, 0.006623739508638536, 0.024169921875, 0.21266888927882086, 0.004546908635468114, 0.4644140350877193, 0.00591195751107905, 0.5671999999999999, 0.0, 5.204352203726982e-07], '5': [0.003974582803167046, 0.01190185546875, 0.18929431376180775, 0.0021560792451752937, 0.015274954840938857, 0.0413360595703125, 0.13502500463048714, 0.008656094595054242, 0.2837205925925926, 0.011517276154508253, 0.3273999999999999, 0.0, 2.2944983716296495e-06]}, '1': {'1': [0.01745991778366743, 0.13330078125, 0.1666919230186392, 0.012498507929444395, 0.008794508626081544, 0.0683441162109375, 0.2518563418699803, 0.006557225017506427, 0.6551165151515151, 0.1461447530029049, 1.5952000000000002, 0.00030307622367025305, -6.71839817994486e-06], '2': [0.01576872398048997, 0.11004638671875, 0.17624797260767705, 0.011874382612913703, 0.007355703497563003, 0.063262939453125, 0.2759055685709137, 0.005353108416775609, 0.664985945945946, 0.22564751043706982, 2.3568, 0.0007208506037123806, 3.0184467432070763e-05], '3': [0.016834862817464734, 0.10711669921875, 0.12385397566261043, 0.010210459675732673, 0.006991805896638525, 0.05291748046875, 0.23737289548258042, 0.005118249131484526, 0.7038323999999999, 0.09012218944433163, 1.2955999999999999, 0.0, 1.7932120498114455e-06], '4': [0.01624700006560064, 0.08184814453125, 0.13230391296718721, 0.010096274630523346, 0.006410319455413151, 0.03900146484375, 0.24232321459905246, 0.004563770456971738, 0.694470701754386, 0.07905537027731246, 1.1703999999999999, 0.0, 1.2355520070555934e-05], '5': [0.020006202433146783, 0.11279296875, 0.1737049068216789, 0.014510097131530179, 0.00870274334484326, 0.0604248046875, 0.2576508804923919, 0.006020127976422262, 0.7650033504273503, 0.18888218438624177, 2.2648, 0.00034193879295606086, -7.484701743981861e-07]}, '2': {'1': [0.03701667312138605, 0.48175048828125, 0.13714182735628042, 0.028527836988579556, 0.00962890534505662, 0.0972747802734375, 0.2526581366011894, 0.007592204871521871, -0.05535329729729731, 0.389147481076091, 1.12, 0.0016219138583528565, -0.00017734880190042993], '2': [0.05605906972012585, 0.728759765625, 0.15079500769362283, 0.04570180889395357, 0.022246936792071167, 0.2032470703125, 0.28363822875705247, 0.0181818341541995, -0.1621129230769231, 0.4012124197110927, 0.7678, 0.0020516327577363653, -0.00018692108787173123], '3': [0.04336534865689463, 0.39495849609375, 0.2081066853834006, 0.03180753348005212, 0.019210994245302326, 0.104644775390625, 0.2675908054044569, 0.013625052459912818, -0.08241263157894738, 0.33346391541823117, 1.7416, 0.0029829794700824705, -0.00014185053469919516], '4': [0.06487298554636435, 0.8785400390625, 0.19967561722832944, 0.05166313713613995, 0.013633174787322466, 0.128814697265625, 0.30762299513425845, 0.009650441229459253, -0.13143430630630631, 0.3800528622757343, 1.1568, 0.002342764462065237, -0.0001757656867122758], '5': [0.04605573833347097, 0.56634521484375, 0.21383160179872848, 0.03490825767448176, 0.022935690711269243, 0.1263427734375, 0.3056287796557606, 0.01751656809433346, -0.037762759689922494, 0.25814329874621555, 1.0832000000000002, 0.0021708792060784617, -0.00010479693645550052]}, '3': {'1': [0.06666059775795927, 0.36566162109375, 0.11758009432428038, 0.03798634625145099, 0.09589783278118581, 0.936767578125, 0.31566108310294355, 0.0724352171785137, -0.28372952845528454, 0.2879444621409897, 0.52, 0.0017889087656529517, -0.00010216214518629092], '2': [0.028400519040188962, 0.36090087890625, 0.11831082236279707, 0.01845200002389393, 0.06906328567307285, 0.3870391845703125, 0.34416139511027527, 0.050492947208441975, -0.2559360683760684, 0.28748525778615397, 0.5438000000000001, 0.001538724568302274, -0.0001163346319399635], '3': [0.026872172099736805, 0.25762939453125, 0.19161771709795206, 0.021019732851432105, 0.09220916129668498, 0.75457763671875, 0.33910144467375775, 0.07045219784614146, -0.3003303492063492, 0.3241576808540921, 0.696, 0.0014287982219399905, -0.00011106434030924326], '4': [0.03224804078389381, 0.36859130859375, 0.14737561976406224, 0.023877465471875494, 0.08250524718428286, 1.1531982421875, 0.32894511882373056, 0.06375121644479873, -0.27145336752136756, 0.29059513319336294, 1.4832, 0.0018806633612583347, -0.00011592650334266791], '5': [0.024408102937141216, 0.19610595703125, 0.1247592235886798, 0.015037408860519162, 0.06386941034280033, 0.44805908203125, 0.3390131871388354, 0.046133417352076656, -0.3375715851851852, 0.30169159148351743, 0.6961999999999999, 0.0010371906949177656, -9.453002551262363e-05]}}

2 - Restructuring of "features_dict" to a compatible format of scikit-learn package

features_dict must be converted to a list, containing inside it a number of sub-lists equal to the number of training examples (in our case 20). In its turn, each sub-list is formed by a number of entries equal to the number of extracted features (13 for our original formulation of the problem).

In [4]:
# Initialisation of a list containing our training data and another list containing the labels of each training example.
features_list = []
class_training_examples = []

# Access each feature list inside dictionary.
list_classes = features_dict.keys()
for class_i in list_classes:
    list_trials = features_dict[class_i].keys()
    for trial in list_trials:
        # Storage of the class label.
        class_training_examples += [int(class_i)]
        features_list += [features_dict[class_i][trial]]
In [5]:
print(fg(232,77,14) + "\033[1m[Number of list entries;Number of sub-list entries]:\033[0m" + fg.rs + " [" + str(len(features_list)) + "; " + str(len(features_list[0])) + "]" + u'\u2713')
print(fg(253,196,0) + "\033[1mClass of each training example:\033[0m" + fg.rs)
print(class_training_examples)
print(fg(98,195,238) + "\033[1mFeatures List:\033[0m" + fg.rs)
print(features_list)
[Number of list entries;Number of sub-list entries]: [20; 13]✓
Class of each training example:
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
Features List:
[[0.002128164580188196, 0.00732421875, 0.3148858143023837, 0.0013299640761190862, 0.00525897736063944, 0.0177154541015625, 0.14585764294049008, 0.0032250390995378314, 0.5878418, 0.004769659606303164, 0.6044, 0.0, 1.4062325168397938e-06], [0.002029433963100043, 0.0075531005859375, 0.3459899981478051, 0.0012865379359157589, 0.00426341220793342, 0.0205078125, 0.24356362289312836, 0.0032271742477853944, 0.5960790740740741, 0.005347679929104084, 0.6175999999999999, 0.0, 1.7843743867526657e-06], [0.004812456585924175, 0.01629638671875, 0.1500312565117733, 0.0027146265743094667, 0.002620978804585002, 0.01263427734375, 0.17816211710773078, 0.0022142130046097727, 0.9737463333333332, 0.008456826821502778, 1.0055999999999998, 0.0, 4.284292720672414e-07], [0.003288393293733703, 0.0120849609375, 0.2182839094577996, 0.001892522517093399, 0.006623739508638536, 0.024169921875, 0.21266888927882086, 0.004546908635468114, 0.4644140350877193, 0.00591195751107905, 0.5671999999999999, 0.0, 5.204352203726982e-07], [0.003974582803167046, 0.01190185546875, 0.18929431376180775, 0.0021560792451752937, 0.015274954840938857, 0.0413360595703125, 0.13502500463048714, 0.008656094595054242, 0.2837205925925926, 0.011517276154508253, 0.3273999999999999, 0.0, 2.2944983716296495e-06], [0.01745991778366743, 0.13330078125, 0.1666919230186392, 0.012498507929444395, 0.008794508626081544, 0.0683441162109375, 0.2518563418699803, 0.006557225017506427, 0.6551165151515151, 0.1461447530029049, 1.5952000000000002, 0.00030307622367025305, -6.71839817994486e-06], [0.01576872398048997, 0.11004638671875, 0.17624797260767705, 0.011874382612913703, 0.007355703497563003, 0.063262939453125, 0.2759055685709137, 0.005353108416775609, 0.664985945945946, 0.22564751043706982, 2.3568, 0.0007208506037123806, 3.0184467432070763e-05], [0.016834862817464734, 0.10711669921875, 0.12385397566261043, 0.010210459675732673, 0.006991805896638525, 0.05291748046875, 0.23737289548258042, 0.005118249131484526, 0.7038323999999999, 0.09012218944433163, 1.2955999999999999, 0.0, 1.7932120498114455e-06], [0.01624700006560064, 0.08184814453125, 0.13230391296718721, 0.010096274630523346, 0.006410319455413151, 0.03900146484375, 0.24232321459905246, 0.004563770456971738, 0.694470701754386, 0.07905537027731246, 1.1703999999999999, 0.0, 1.2355520070555934e-05], [0.020006202433146783, 0.11279296875, 0.1737049068216789, 0.014510097131530179, 0.00870274334484326, 0.0604248046875, 0.2576508804923919, 0.006020127976422262, 0.7650033504273503, 0.18888218438624177, 2.2648, 0.00034193879295606086, -7.484701743981861e-07], [0.03701667312138605, 0.48175048828125, 0.13714182735628042, 0.028527836988579556, 0.00962890534505662, 0.0972747802734375, 0.2526581366011894, 0.007592204871521871, -0.05535329729729731, 0.389147481076091, 1.12, 0.0016219138583528565, -0.00017734880190042993], [0.05605906972012585, 0.728759765625, 0.15079500769362283, 0.04570180889395357, 0.022246936792071167, 0.2032470703125, 0.28363822875705247, 0.0181818341541995, -0.1621129230769231, 0.4012124197110927, 0.7678, 0.0020516327577363653, -0.00018692108787173123], [0.04336534865689463, 0.39495849609375, 0.2081066853834006, 0.03180753348005212, 0.019210994245302326, 0.104644775390625, 0.2675908054044569, 0.013625052459912818, -0.08241263157894738, 0.33346391541823117, 1.7416, 0.0029829794700824705, -0.00014185053469919516], [0.06487298554636435, 0.8785400390625, 0.19967561722832944, 0.05166313713613995, 0.013633174787322466, 0.128814697265625, 0.30762299513425845, 0.009650441229459253, -0.13143430630630631, 0.3800528622757343, 1.1568, 0.002342764462065237, -0.0001757656867122758], [0.04605573833347097, 0.56634521484375, 0.21383160179872848, 0.03490825767448176, 0.022935690711269243, 0.1263427734375, 0.3056287796557606, 0.01751656809433346, -0.037762759689922494, 0.25814329874621555, 1.0832000000000002, 0.0021708792060784617, -0.00010479693645550052], [0.06666059775795927, 0.36566162109375, 0.11758009432428038, 0.03798634625145099, 0.09589783278118581, 0.936767578125, 0.31566108310294355, 0.0724352171785137, -0.28372952845528454, 0.2879444621409897, 0.52, 0.0017889087656529517, -0.00010216214518629092], [0.028400519040188962, 0.36090087890625, 0.11831082236279707, 0.01845200002389393, 0.06906328567307285, 0.3870391845703125, 0.34416139511027527, 0.050492947208441975, -0.2559360683760684, 0.28748525778615397, 0.5438000000000001, 0.001538724568302274, -0.0001163346319399635], [0.026872172099736805, 0.25762939453125, 0.19161771709795206, 0.021019732851432105, 0.09220916129668498, 0.75457763671875, 0.33910144467375775, 0.07045219784614146, -0.3003303492063492, 0.3241576808540921, 0.696, 0.0014287982219399905, -0.00011106434030924326], [0.03224804078389381, 0.36859130859375, 0.14737561976406224, 0.023877465471875494, 0.08250524718428286, 1.1531982421875, 0.32894511882373056, 0.06375121644479873, -0.27145336752136756, 0.29059513319336294, 1.4832, 0.0018806633612583347, -0.00011592650334266791], [0.024408102937141216, 0.19610595703125, 0.1247592235886798, 0.015037408860519162, 0.06386941034280033, 0.44805908203125, 0.3390131871388354, 0.046133417352076656, -0.3375715851851852, 0.30169159148351743, 0.6961999999999999, 0.0010371906949177656, -9.453002551262363e-05]]

2.1 - Normalisation of the features values, ensuring that the training stage is not affected by scale factors

In [6]:
features_list = normalize(features_list, axis=0, norm="max") # axis=0 specifies that each feature is normalised independently from the others 
                                                             # and norm="max" defines that the normalization reference value will be the feature maximum value.
In [7]:
print(features_list)
[[ 0.03192537  0.00833681  0.91010092  0.025743    0.05483938  0.01536202
   0.42380594  0.04452308  0.6036909   0.01188812  0.25644942  0.
   0.04658795]
 [ 0.03044428  0.00859733  1.          0.02490244  0.04445786  0.01778342
   0.70770175  0.04455256  0.61215026  0.0133288   0.26205024  0.
   0.05911565]
 [ 0.07219342  0.0185494   0.43362888  0.05254475  0.02733095  0.01095586
   0.51767025  0.03056818  1.          0.02107818  0.42668024  0.
   0.0141937 ]
 [ 0.04933039  0.01375573  0.63089659  0.03663197  0.06907079  0.02095903
   0.6179336   0.06277207  0.47693534  0.01473523  0.24066531  0.
   0.01724182]
 [ 0.05962417  0.01354731  0.54710921  0.04173342  0.15928363  0.03584471
   0.39233048  0.11950119  0.29137013  0.02870618  0.13891718  0.
   0.07601586]
 [ 0.26192261  0.15172989  0.48178249  0.24192313  0.09170706  0.05926485
   0.73179719  0.09052537  0.67277944  0.3642578   0.67684997  0.10160185
  -0.22257799]
 [ 0.23655239  0.12526053  0.50940193  0.22984246  0.07670354  0.05485869
   0.80167495  0.07390201  0.68291497  0.56241407  1.          0.24165456
   1.        ]
 [ 0.25254593  0.1219258   0.35796982  0.1976353   0.0729089   0.04588758
   0.68971389  0.07065968  0.72280878  0.22462463  0.54972845  0.
   0.05940844]
 [ 0.24372719  0.09316382  0.38239231  0.19542512  0.0668453   0.03382026
   0.70409761  0.06300486  0.71319468  0.19704118  0.49660557  0.
   0.40933371]
 [ 0.30012036  0.12838683  0.50205182  0.28085978  0.09075016  0.05239759
   0.74863388  0.08311051  0.78562899  0.47077851  0.96096402  0.11462995
  -0.02479653]
 [ 0.55530065  0.54835348  0.39637512  0.55218941  0.10040796  0.08435218
   0.7341269   0.10481372 -0.05684571  0.9699288   0.47522064  0.54372277
  -5.87549879]
 [ 0.8409626   0.8295123   0.43583632  0.88461157  0.23198581  0.17624643
   0.82414307  0.25100821 -0.16648373  1.          0.32578072  0.68777971
  -6.19262501]
 [ 0.65053945  0.44956232  0.6014818   0.61567174  0.20032772  0.09074309
   0.77751546  0.18809984 -0.0846346   0.83114056  0.73896809  1.
  -4.69945461]
 [ 0.97318338  1.          0.57711384  1.          0.14216353  0.11170213
   0.89383353  0.13322858 -0.13497797  0.94726096  0.49083503  0.78537733
  -5.82305078]
 [ 0.69089897  0.6446436   0.61802828  0.67568986  0.23916798  0.10955859
   0.88803911  0.24182392 -0.0387809   0.64340805  0.45960625  0.72775533
  -3.4718829 ]
 [ 1.          0.41621509  0.33983669  0.73526983  1.          0.81232137
   0.91718911  1.         -0.2913793   0.71768581  0.22063815  0.59970536
  -3.38459327]
 [ 0.42604657  0.41079617  0.34194868  0.35715988  0.72017567  0.33562242
   1.          0.69707732 -0.26283649  0.71654127  0.23073659  0.51583478
  -3.8541224 ]
 [ 0.40311928  0.29324719  0.55382444  0.40686133  0.9615354   0.65433471
   0.98529774  0.97262355 -0.30842771  0.80794528  0.29531568  0.47898359
  -3.67951963]
 [ 0.48376465  0.41954981  0.42595341  0.46217607  0.86034527  1.
   0.95578738  0.88011355 -0.27877216  0.72429247  0.6293279   0.63046474
  -3.84060125]
 [ 0.36615488  0.22321801  0.36058621  0.29106651  0.66601516  0.38853604
   0.9850413   0.6368921  -0.34667302  0.75194978  0.29540054  0.34770293
  -3.13174402]]

3 - Selection of a classification algorithm to wrap in our Feature Selection methodology

A Support Vector Machine shares some principles with k-Nearest Neighbour Classifiers (which we want to use on Jupyter Notebook [volume 4] ), namely the Cartesian logic, given that each example corresponds to a point with a number $N$ of coordinates equivalent to the number of features analysed (13 for our original problem), that is, each feature defines a dimension of the space.
Because of this "contact point" our "wrapped" classifier will be a Support Vector Machine.

In [8]:
# Creation of a "Support Vector Classifier" supposing that  our classes are linearly separable.
svc = SVC(kernel="linear")

4 - Configuration of the Recursive Feature Elimination procedure given as an input our previously created "svc" object

Some inputs need to be given:

  • estimator - our previously created "Support Vector Classifier" object
  • step - number of features eliminated on each iteration of the recursive algorithm
  • cv - cross-validation method for estimating the quality of the "virtual" classifiers and dividing the original training set into $K$ subsets of training/test examples. The choice result in a Stratified K-Fold Strategy ("The folds are made by preserving the percentage of samples for each class")
  • scoring - definition of criteria for qualifying a classifier. For us the best classifier will be the one that achieve a great accuracy on classifying the test examples
In [9]:
rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(5), scoring='accuracy')

5 - Execution of the Recursive Feature Elimination procedure

In [10]:
# Fit data to the model.
selector = rfecv.fit(features_list, class_training_examples)
In [11]:
print(selector)
RFECV(cv=StratifiedKFold(n_splits=5, random_state=None, shuffle=False),
   estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False),
   min_features_to_select=1, n_jobs=None, scoring='accuracy', step=1,
   verbose=0)

6 - Get the optimal number of features

It will be the smallest number that provides the possibility to obtain a highest cross-validation score.

6.1 - Get the list of average score of each virtual classifier (1 per Recursive Feature Elimination iteration)

The first element of the list refers to the average score of the trained classifiers when the set of features is 1, while the last one corresponds to the case where all features are taken into consideration.

In [12]:
# Get list of average score of the virtual classifier
avg_scores = rfecv.grid_scores_
In [13]:
print(avg_scores)
[0.4  0.7  0.75 0.8  0.85 0.85 0.9  0.95 0.95 0.95 0.95 0.95 0.95]

6.2 - Identification of the maximum score

In [14]:
max_score = max(avg_scores)
In [15]:
print(fg(98,195,238) + "\033[1mMaximum Average Score:\033[0m " + fg.rs + str(max_score))
Maximum Average Score: 0.95

6.3 - Identification of the smallest feature set that achieve the maximum score

In [16]:
for nbr_features in range(0, len(avg_scores)):
    if avg_scores[nbr_features] == max_score:
        optimal_nbr_features = nbr_features + 1
        break
In [17]:
print(fg(98,195,238) + "\033[1mOptimal Number of Features:\033[0m " + fg.rs + str(optimal_nbr_features))
Optimal Number of Features: 8
In [18]:
bsnb.plot([range(1, len(rfecv.grid_scores_) + 1)], [avg_scores], 
          y_axis_label="Cross validation score (nb of correct classifications)", x_axis_label="Number of features selected")

7 - Identification of the set of relevant features, taking into consideration the previously determined optimal number

It should be repeated the Recursive Feature Elimination procedure with "RFE" scikit-learn function, specifying the desired number of target features.

In [19]:
rfe = RFE(estimator=svc, step=1, n_features_to_select=optimal_nbr_features)

# Fit data to the model.
final_selector = rfe.fit(features_list, class_training_examples)

# Acception/Rejection Label attributed to each feature.
acception_labels = final_selector.support_
In [20]:
print(fg(98,195,238) + "\033[1mRelevant Features (True):\033[0m " + fg.rs)
print(acception_labels)
Relevant Features (True): 
[ True False  True  True  True False False  True False  True  True False
  True]

Each training array has the following structure/content:
[$\sigma_{emg\,flexor}$, $max_{emg\,flexor}$, $zcr_{emg\,flexor}$, $\sigma_{emg\,flexor}^{abs}$, $\sigma_{emg\,adductor}$, $max_{emg\,adductor}$, $zcr_{emg\,adductor}$, $\sigma_{emg\,adductor}^{abs}$, $\mu_{acc\,z}$, $\sigma_{acc\,z}$, $max_{acc\,z}$, $zcr_{acc\,z}$, $m_{acc\,z}$]

So, the relevant features are:

  • $\sigma_{emg\,flexor}$
  • $zcr_{emg\,flexor}$
  • $\sigma_{emg\,flexor}^{abs}$
  • $\sigma_{emg\,adductor}$
  • $\sigma_{emg\,adductor}^{abs}$
  • $\sigma_{acc\,z}$
  • $max_{acc\,z}$
  • $m_{acc\,z}$

8 - Removal of meaningless features from our "features_list" list

In [21]:
# Access each training example and exclude meaningless entries.
final_features_list = []
for example_nbr in range(0, len(features_list)):
    final_features_list += [list(array(features_list[example_nbr])[array(acception_labels)])]

9 - Storage of the final list of features (after Recursive Feature Elimination ) inside a .json file

In [22]:
filename = "classification_game_features_final.json"

# Generation of .json file in our previously mentioned "relative_path".
# [Generation of new file]
with open(relative_path + "/" + filename, 'w') as file:
    dump({"features_list_final": final_features_list, "class_labels": class_training_examples}, file)

We reach the end of the "Classification Game" third volume. After Feature Selection all training examples are ready to be delivered to our classification algorithm in order to participate on the training process.

If your are feeling your interest increasing, please jump to the next volume

We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !

In [23]:
from biosignalsnotebooks.__notebook_support__ import css_style_apply
css_style_apply()
.................... CSS Style Applied to Jupyter Notebook .........................
Out[23]: