Signal Classifier - Distinguish between EMG and ECG
Difficulty Level:
Tags train_and_classify☁classification☁biosignals☁emg☁ecg

Machine learning is a branch of artificial intelligence that emerged with the increase of computational power that has accompanied the evolution of technology. It allows for the computer to learn the outcome of numerous problems by exploiting the internal structure of the datasets given as input.

There are three main settings of machine learning :

  • Unsupervised Learning - Learning the internal structure of a given dataset without prior knowledge. It is often used as a way of finding interesting features or similarities in a dataset that may not be perceptible to the human eye. For example, in retail, it may be used to aggregate a set of customers that exhibit similar shopping patterns in order to send directed advertisement to them.
  • Semi-Supervised Learning - Learning the internal structure of a given dataset with some knowledge about it. It is usually used in cases where it is possible to distinguish classes from datasets without the need of labelling every class, which is an expensive work. For example, it may be used in anomaly detection scenarios, where the number of normal instances is usually immensely higher than the anomalous instances. A concrete example is that of aircraft failure detection, in which we know the normal functioning of the system, but we lack in anomalies because it may be too expensive or rare to happen (p.e. engine malfunction). In this cases, the normal instances are labelled, but the anomalous ones are not.
  • Supervised Learning - Supervised learning consists of learning the patterns in a given dataset in which we have full comprehension about it. For example, there are databases of ECG signals that include high number of arrhythmias, where every heartbeat is labelled by the type of arrhythmia or if it is normal.
In this Jupyter Notebook , it will be shown how to use the package scikit-learn in order to easily deploy machine learning models to determine the nature of a given biosignal, in this case, if it is an ECG , an EMG or other .


1 - Import the required packages

In order to facilitate our work, we will use the biosignalsnotebooks package, which includes multiple sample signals among other features, scikit-learn , that has an high-level implementation of a high number of methods and models used in machine learning applications, and numpy , that implements mathematical functions in an easy way.

In [1]:
import biosignalsnotebooks as bsnb

from numpy import mean, std, zeros, diff, sign, vstack, concatenate, ravel
from scipy.stats import kurtosis, skew

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import ShuffleSplit
from sklearn.metrics import classification_report, accuracy_score

2 - Load sample signals

biosignalsnotebooks Python package provides a set of sample signals that we will be using in order to train and test our model. As we want to distinguish between ECG, EMG and other types of signals, we will use multiple files.
Note: Since we will only use one channel of each signal, we parse the channel in the same line where we load the signal. Furthermore, we will use the z-standardisation to normalise our signals to have 0 mean and unit variance.

First, we will load the ECG signal, that consists of a 20 seconds acquisition at 1000 Hz sampling rate and a resolution of 16 bits.

In [2]:
# Relative path to the signal samples folder
path = "/signal_samples"
# Load the ECG signal
ECG = bsnb.load(path + "/ecg_20_sec_1000_Hz.h5")['CH1']
# Normalise ECG signal
ECG = ECG - mean(ECG); ECG /= std(ECG)
In [3]:
ECG_time = bsnb.generate_time(ECG)
bsnb.plot(ECG_time, ECG)

For the EMG signal, we will be using an acquisition of 28 seconds at 1000 Hz sampling rate and a resolution of 16 bits.

In [4]:
# Load the EMG signal
EMG = bsnb.load(path + "/emg_bursts.h5")['CH3']
# Normalise EMG signal
EMG = EMG - mean(EMG); EMG /= std(EMG)
In [5]:
EMG_time = bsnb.generate_time(EMG)
bsnb.plot(EMG_time, EMG)

For the other signals, we could use any type that did not correspond to ECG or EMG. In this case, we will be using two different types.

The first is a signal of EEG acquired during 28 seconds at 1000 Hz sampling rate and a resolution of 16 bits.

In [6]:
# Load other signal
EEG = bsnb.load(path + "/signal_sample_single_hub_EEG_2018_7_4.h5")['CH1']
# Normalise other signal
EEG = EEG - mean(EEG); EEG /= std(EEG)
In [7]:
EEG_time = bsnb.generate_time(EEG)
bsnb.plot(EEG_time, EEG)

The second signal is an acoustic signal acquired during 20 seconds at 1000 Hz sampling rate and a resolution of 16 bits.

In [8]:
# Load another signal
ACO = bsnb.load(path + "/sync_cable_acoustic.h5")
# Get the mac address of one of the devices used in this acquisition
mac_address = list(ACO.keys())[0]
# Load the the signal corresponding to the first channel of the device
ACO = ACO[mac_address]['CH1']
# Normalise another signal
ACO = ACO - mean(ACO); ACO /= std(ACO)
In [9]:
ACO_time = bsnb.generate_time(ACO)
bsnb.plot(ACO_time, ACO)

3 - Windowing

Machine learning pipelines usually require the extraction of features that have the ability to represent each class in a representative way in order to be possible to distinguish them.
To do this, it is important to window the input signals, where the features will be extracted. For example, we can use windows of 1 second with or without overlap and represent each of these windows by a set of features, such as the mean, standard deviation, among others. We achieve this by using the windowing function of the biosignalsnotebooks Python package.

In [10]:
sampling_rate = 1000 # Hz
time_window = 1 # seconds
overlap = 0.9

ECG_windows = bsnb.windowing(ECG, sampling_rate, time_window, overlap)
EMG_windows = bsnb.windowing(EMG, sampling_rate, time_window, overlap)
EEG_windows = bsnb.windowing(EEG, sampling_rate, time_window, overlap)
ACO_windows = bsnb.windowing(ACO, sampling_rate, time_window, overlap)

# Get the number of time windows of the shortest signal in order to concatenate all signals to the same number of time windows.
index = min([ECG_windows.shape[0], EMG_windows.shape[0], EEG_windows.shape[0], ACO_windows.shape[0]])

# Concatenate all signals to the same number of time windows.
ECG_windows = ECG_windows[:index]
EMG_windows = EMG_windows[:index]
EEG_windows = EEG_windows[:index]
ACO_windows = ACO_windows[:index]
In [11]:
print("The number of time windows for each signal is " + str(ECG_windows.shape[0]) + ", while the number of data points in each time window is " + str(ECG_windows.shape[1]) + ".")
The number of time windows for each signal is 202, while the number of data points in each time window is 1000.

4 - Feature Extraction

In this step we will define the features that we want to extract from each time window that will represent each of those windows in the input of the classifier. In this case we will use a few statistical features but you can extract features from the time, spectral, statistical or other domains.
In order to extract those features from each time window, we will use the function features_extraction from biosignalsnotebook Python package.

In [12]:
# Defining the functions applied to each time window in order to extract features
func = [mean, std, kurtosis, skew, bsnb.zero_crossing_rate]

ECG_features = bsnb.features_extraction(ECG_windows, func)
EMG_features = bsnb.features_extraction(EMG_windows, func)
EEG_features = bsnb.features_extraction(EEG_windows, func)
ACO_features = bsnb.features_extraction(ACO_windows, func)

5 - Generate dataset

Now that we have the data required to train our classifier, it is time to build the proper dataset. To do this, we only need to concatenate our features arrays and assign each of them to a different class.

In [13]:
# Get the number of samples of each type of signal - in this case, our classes.
lenght_ECG = ECG_features.shape[0]
lenght_EMG = EMG_features.shape[0]
lenght_EEG = EEG_features.shape[0]
lenght_ACO = ACO_features.shape[0]

# Build the samples array containing all samples from each signal.
samples = vstack([ECG_features, EMG_features, EEG_features, ACO_features])

# Build the classes array to assign each sample to its class. 
classes = concatenate([lenght_ECG*['ECG'], lenght_EMG*['EMG'], lenght_EEG*['Other'], lenght_ACO*['Other']])

6 - Build the classifier

In this step we will build our classifier by using the scikit-learn Python package. We will use a Random Forest model, but there are multiple classifiers that are available in the specified Python package.
Furthermore, in order to validate our approach, we will use cross-validation, which consists of dividing our dataset in train and test set and let the classifier train on the train set and then test it on the previously unseen test set.
For more information about cross-validation, see the notebook Stone, Paper or Scissor Game - Train and Classify [Volume 4]

We use the ShuffleSplit method, that allows to generate these sets in random orders, where the samples may not be consecutive, avoiding to leave out one of the types of signals from the train/test set.

In [14]:
model = ShuffleSplit(n_splits=10, train_size=.9, test_size=.1)

The next step is the cross validation step, where we initialise and evaluate our random forest classifier using the scikit-learn Python package.

In [15]:
acc = []
for train_index, test_index in model.split(samples):
    
    # For each iteration, we divide our dataset in train and test set.
    samples_train, samples_test = samples[train_index], samples[test_index]
    labels_train, labels_test = classes[train_index], classes[test_index]
    
    # Build the random forest clasdsifier.
    random_forest = RandomForestClassifier(n_estimators=1000, criterion='gini')
    
    # Train the classifier on the training set.
    random_forest = random_forest.fit(samples_train, ravel(labels_train))
    
    # Test the classifier on the testing set.
    results = random_forest.predict(samples_test)

    # This step is not necessary for the classification procedure, but is important to store the values 
    # of accuracy to calculate the mean and standard deviation values and evaluate the performance of the classifier.
    acc.append(accuracy_score(labels_test, results)*100)
In [16]:
print("Accuracy: ", mean(acc), "+-", std(acc), "%")
Accuracy:  97.16049382716048 +- 0.9642283550502027 %

As can be seen, in this simple example we achieved high values of accuracy that indicate that most of the time windows were well classified as either ECG , EMG or other type of signal. These results may not yield if the signals are not as clean and of good quality as the samples used in this Jupyter Notebook , but the procedure may be maintained. Furthermore, other classifiers may be used, their hyperparameters may be optimised and other cross-validation techniques may be applied, so there is a lot to explore.

We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !

In [17]:
from biosignalsnotebooks.__notebook_support__ import css_style_apply
css_style_apply()
.................... CSS Style Applied to Jupyter Notebook .........................
Out[17]: