|
Signal Classifier - Distinguish between EMG and ECG |
Tags | train_and_classify☁classification☁biosignals☁emg☁ecg |
Machine learning is a branch of artificial intelligence that emerged with the increase of computational power that has accompanied the evolution of technology. It allows for the computer to learn the outcome of numerous problems by exploiting the internal structure of the datasets given as input.
There are three main settings of
machine learning
:
1 - Import the required packages
In order to facilitate our work, we will use the biosignalsnotebooks package, which includes multiple sample signals among other features, scikit-learn , that has an high-level implementation of a high number of methods and models used in machine learning applications, and numpy , that implements mathematical functions in an easy way.import biosignalsnotebooks as bsnb
from numpy import mean, std, zeros, diff, sign, vstack, concatenate, ravel
from scipy.stats import kurtosis, skew
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import ShuffleSplit
from sklearn.metrics import classification_report, accuracy_score
2 - Load sample signals
biosignalsnotebooks Python package provides a set of sample signals that we will be using in order to train and test our model. As we want to distinguish between ECG, EMG and other types of signals, we will use multiple files.First, we will load the ECG signal, that consists of a 20 seconds acquisition at 1000 Hz sampling rate and a resolution of 16 bits.
# Relative path to the signal samples folder
path = "/signal_samples"
# Load the ECG signal
ECG = bsnb.load(path + "/ecg_20_sec_1000_Hz.h5")['CH1']
# Normalise ECG signal
ECG = ECG - mean(ECG); ECG /= std(ECG)
For the EMG signal, we will be using an acquisition of 28 seconds at 1000 Hz sampling rate and a resolution of 16 bits.
# Load the EMG signal
EMG = bsnb.load(path + "/emg_bursts.h5")['CH3']
# Normalise EMG signal
EMG = EMG - mean(EMG); EMG /= std(EMG)
For the other signals, we could use any type that did not correspond to ECG or EMG. In this case, we will be using two different types.
The first is a signal of EEG acquired during 28 seconds at 1000 Hz sampling rate and a resolution of 16 bits.
# Load other signal
EEG = bsnb.load(path + "/signal_sample_single_hub_EEG_2018_7_4.h5")['CH1']
# Normalise other signal
EEG = EEG - mean(EEG); EEG /= std(EEG)
The second signal is an acoustic signal acquired during 20 seconds at 1000 Hz sampling rate and a resolution of 16 bits.
# Load another signal
ACO = bsnb.load(path + "/sync_cable_acoustic.h5")
# Get the mac address of one of the devices used in this acquisition
mac_address = list(ACO.keys())[0]
# Load the the signal corresponding to the first channel of the device
ACO = ACO[mac_address]['CH1']
# Normalise another signal
ACO = ACO - mean(ACO); ACO /= std(ACO)
3 - Windowing
Machine learning pipelines usually require the extraction of features that have the ability to represent each class in a representative way in order to be possible to distinguish them.sampling_rate = 1000 # Hz
time_window = 1 # seconds
overlap = 0.9
ECG_windows = bsnb.windowing(ECG, sampling_rate, time_window, overlap)
EMG_windows = bsnb.windowing(EMG, sampling_rate, time_window, overlap)
EEG_windows = bsnb.windowing(EEG, sampling_rate, time_window, overlap)
ACO_windows = bsnb.windowing(ACO, sampling_rate, time_window, overlap)
# Get the number of time windows of the shortest signal in order to concatenate all signals to the same number of time windows.
index = min([ECG_windows.shape[0], EMG_windows.shape[0], EEG_windows.shape[0], ACO_windows.shape[0]])
# Concatenate all signals to the same number of time windows.
ECG_windows = ECG_windows[:index]
EMG_windows = EMG_windows[:index]
EEG_windows = EEG_windows[:index]
ACO_windows = ACO_windows[:index]
4 - Feature Extraction
In this step we will define the features that we want to extract from each time window that will represent each of those windows in the input of the classifier. In this case we will use a few statistical features but you can extract features from the time, spectral, statistical or other domains.# Defining the functions applied to each time window in order to extract features
func = [mean, std, kurtosis, skew, bsnb.zero_crossing_rate]
ECG_features = bsnb.features_extraction(ECG_windows, func)
EMG_features = bsnb.features_extraction(EMG_windows, func)
EEG_features = bsnb.features_extraction(EEG_windows, func)
ACO_features = bsnb.features_extraction(ACO_windows, func)
5 - Generate dataset
Now that we have the data required to train our classifier, it is time to build the proper dataset. To do this, we only need to concatenate our features arrays and assign each of them to a different class.# Get the number of samples of each type of signal - in this case, our classes.
lenght_ECG = ECG_features.shape[0]
lenght_EMG = EMG_features.shape[0]
lenght_EEG = EEG_features.shape[0]
lenght_ACO = ACO_features.shape[0]
# Build the samples array containing all samples from each signal.
samples = vstack([ECG_features, EMG_features, EEG_features, ACO_features])
# Build the classes array to assign each sample to its class.
classes = concatenate([lenght_ECG*['ECG'], lenght_EMG*['EMG'], lenght_EEG*['Other'], lenght_ACO*['Other']])
6 - Build the classifier
In this step we will build our classifier by using the scikit-learn Python package. We will use a Random Forest
We use the
ShuffleSplit
method, that allows to generate these sets in random orders, where the samples may not be consecutive, avoiding to leave out one of the types of signals from the train/test set.
model = ShuffleSplit(n_splits=10, train_size=.9, test_size=.1)
The next step is the cross validation step, where we initialise and evaluate our random forest classifier using the scikit-learn Python package.
acc = []
for train_index, test_index in model.split(samples):
# For each iteration, we divide our dataset in train and test set.
samples_train, samples_test = samples[train_index], samples[test_index]
labels_train, labels_test = classes[train_index], classes[test_index]
# Build the random forest clasdsifier.
random_forest = RandomForestClassifier(n_estimators=1000, criterion='gini')
# Train the classifier on the training set.
random_forest = random_forest.fit(samples_train, ravel(labels_train))
# Test the classifier on the testing set.
results = random_forest.predict(samples_test)
# This step is not necessary for the classification procedure, but is important to store the values
# of accuracy to calculate the mean and standard deviation values and evaluate the performance of the classifier.
acc.append(accuracy_score(labels_test, results)*100)
As can be seen, in this simple example we achieved high values of accuracy that indicate that most of the time windows were well classified as either ECG , EMG or other type of signal. These results may not yield if the signals are not as clean and of good quality as the samples used in this Jupyter Notebook , but the procedure may be maintained. Furthermore, other classifiers may be used, their hyperparameters may be optimised and other cross-validation techniques may be applied, so there is a lot to explore.
We hope that you have enjoyed this guide.
biosignalsnotebooks
is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining
Notebooks
!