|
Stone, Paper or Scissor Game - Train and Classify [Volume 3] |
Tags | train_and_classify☁machine-learning☁features☁selection |
Previous Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module
Following Notebooks that are part of "Rock, Paper or Scissor Game - Train and Classify" module
☌
Currently we are in possession of a file containing the feature values for all training examples, as demonstrated on a previously created
Jupyter Notebook
.
However, there is a high risk that some of the extracted features are not useful for our classification system. Remember, a good feature is a parameter that has the ability to separate the different classes of our classification system, i.e, a parameter with a characteristic range of values for each available class. In order to ensure that the training process of our classifier happens in the most efficient way, these redundant or invariant features should be removed. The implicit logic of the last two paragraphs is called Feature Selection , which will be focused at this Jupyter Notebook ! |
Starting Point (Setup)
List of Available Classes:
Paper | Stone | Scissor |
Acquired Data:
Protocol/Feature Extraction
Extracted Features
Formal definition of parameters
☝ | Maximum Sample Value of a set of elements is equal to the last element of the sorted set
☉ | $\mu = \frac{1}{N}\sum_{i=1}^N (sample_i)$
☆ | $\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N(sample_i - \mu_{signal})^2}$
☌ | $zcr = \frac{1}{N - 1}\sum_{i=1}^{N-1}bin(i)$
☇ | $\sigma_{abs} = \sqrt{\frac{1}{N}\sum_{i=1}^N(|sample_i| - \mu_{signal_{abs}})^2}$
☍ | $m = \frac{\Delta signal}{\Delta t}$
... being $N$ the number of acquired samples (that are part of the signal), $sample_i$ the value of the sample number $i$, $signal_{abs}$ the absolute signal, $\Delta signal$ is the difference between the y coordinate of two points of the regression curve and $\Delta t$ the difference between the x (time) coordinate of the same two points of the regression curve.
... and
$bin(i)$ a binary function defined as:
$bin(i) = \begin{cases} 1, & \mbox{if } signal_i \times signal_{i-1} \leq 0 \\ 0, & \mbox{if } signal_i \times signal_{i-1}>0 \end{cases}$
Feature Selection
IntroLike described before, Feature Selection is intended to remove redundant or meaningless parameters which would increase the complexity of the classifier and not always translate into an improved performance. Without this step, the risk of overfitting to the training examples increases, making the classifier less able to categorize a new testing example.
There are different approaches to feature selection such as filter methods or wrapper methods .
In the first method ( filter methods ), a ranking will be attributed to the features, using, for example, the Pearson correlation coefficient to evaluate the impact that the feature under analysis has on the target class of the training example, or the Mutual Information parameter which defines whether two variables convey shared information.
The least relevant features will be excluded and the classifier will be trained later (for a deeper explanation, please, visit the article of Girish Chandrashekar and Ferat Sahin at ScienceDirect ).
The second methodology ( wrapper methods ) is characterised by the fact that the selection phase includes a classification algorithm, and features will be excluded or selected according to the quality of the trained classifier.
There are also a third major methodology applicable on Feature Selection , including the so called embedded methods . Essentially this methods are a combination of filter and wrapper , being characterised by the simultaneous execution of Feature Selection and Training stages.
One of the most intuitive Feature Selection methods is Recursive Feature Elimination , which will be used in the current Jupyter Notebook .
Essentially the steps of this method consists in:
0 - Import of the needed packages for a correct execution of the current Jupyter Notebook
# Python package that contains functions specialized on "Machine Learning" tasks.
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV, RFE
from sklearn.preprocessing import normalize
# Package dedicated to the manipulation of json files.
from json import loads, dump
# Package containing a diversified set of function for statistical processing and also provide support to array operations.
from numpy import max, array
# biosignalsnotebooks own package that supports some functionalities used on the Jupyter Notebooks.
import biosignalsnotebooks as bsnb
1 - Loading of the dictionary created on Volume 2 of "Classification Game" Jupyter Notebook
This dictionary contains all the features extracted from our training examples.# Specification of filename and relative path.
relative_path = "/signal_samples/classification_game/features"
filename = "classification_game_features.json"
# Load of data inside file storing it inside a Python dictionary.
with open(relative_path + "/" + filename) as file:
features_dict = loads(file.read())
2 - Restructuring of "features_dict" to a compatible format of scikit-learn package
features_dict must be converted to a list, containing inside it a number of sub-lists equal to the number of training examples (in our case 20). In its turn, each sub-list is formed by a number of entries equal to the number of extracted features (13 for our original formulation of the problem).# Initialisation of a list containing our training data and another list containing the labels of each training example.
features_list = []
class_training_examples = []
# Access each feature list inside dictionary.
list_classes = features_dict.keys()
for class_i in list_classes:
list_trials = features_dict[class_i].keys()
for trial in list_trials:
# Storage of the class label.
class_training_examples += [int(class_i)]
features_list += [features_dict[class_i][trial]]
2.1 - Normalisation of the features values, ensuring that the training stage is not affected by scale factors
features_list = normalize(features_list, axis=0, norm="max") # axis=0 specifies that each feature is normalised independently from the others
# and norm="max" defines that the normalization reference value will be the feature maximum value.
3 - Selection of a classification algorithm to wrap in our Feature Selection methodology
A Support Vector Machine shares some principles with k-Nearest Neighbour Classifiers (which we want to use on Jupyter Notebook [volume 4] ), namely the Cartesian logic, given that each example corresponds to a point with a number $N$ of coordinates equivalent to the number of features analysed (13 for our original problem), that is, each feature defines a dimension of the space.# Creation of a "Support Vector Classifier" supposing that our classes are linearly separable.
svc = SVC(kernel="linear")
4 - Configuration of the Recursive Feature Elimination procedure given as an input our previously created "svc" object
Some inputs need to be given:rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(5), scoring='accuracy')
5 - Execution of the Recursive Feature Elimination procedure
# Fit data to the model.
selector = rfecv.fit(features_list, class_training_examples)
6 - Get the optimal number of features
It will be the smallest number that provides the possibility to obtain a highest cross-validation score.
6.1 - Get the list of average score of each virtual classifier (1 per Recursive Feature Elimination iteration)
The first element of the list refers to the average score of the trained classifiers when the set of features is 1, while the last one corresponds to the case where all features are taken into consideration.# Get list of average score of the virtual classifier
avg_scores = rfecv.grid_scores_
6.2 - Identification of the maximum score
max_score = max(avg_scores)
6.3 - Identification of the smallest feature set that achieve the maximum score
for nbr_features in range(0, len(avg_scores)):
if avg_scores[nbr_features] == max_score:
optimal_nbr_features = nbr_features + 1
break
7 - Identification of the set of relevant features, taking into consideration the previously determined optimal number
It should be repeated the Recursive Feature Elimination procedure with "RFE" scikit-learn function, specifying the desired number of target features.rfe = RFE(estimator=svc, step=1, n_features_to_select=optimal_nbr_features)
# Fit data to the model.
final_selector = rfe.fit(features_list, class_training_examples)
# Acception/Rejection Label attributed to each feature.
acception_labels = final_selector.support_
Each training array has the following structure/content:
[$\sigma_{emg\,flexor}$, $max_{emg\,flexor}$, $zcr_{emg\,flexor}$, $\sigma_{emg\,flexor}^{abs}$, $\sigma_{emg\,adductor}$, $max_{emg\,adductor}$, $zcr_{emg\,adductor}$, $\sigma_{emg\,adductor}^{abs}$, $\mu_{acc\,z}$, $\sigma_{acc\,z}$, $max_{acc\,z}$, $zcr_{acc\,z}$, $m_{acc\,z}$]
So, the relevant features are:
8 - Removal of meaningless features from our "features_list" list
# Access each training example and exclude meaningless entries.
final_features_list = []
for example_nbr in range(0, len(features_list)):
final_features_list += [list(array(features_list[example_nbr])[array(acception_labels)])]
9 - Storage of the final list of features (after Recursive Feature Elimination ) inside a .json file
filename = "classification_game_features_final.json"
# Generation of .json file in our previously mentioned "relative_path".
# [Generation of new file]
with open(relative_path + "/" + filename, 'w') as file:
dump({"features_list_final": final_features_list, "class_labels": class_training_examples}, file)
We reach the end of the "Classification Game" third volume. After Feature Selection all training examples are ready to be delivered to our classification algorithm in order to participate on the training process.
If your are feeling your interest increasing, please jump to the next volume
We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !