Signal Loading - Working with File Header

Difficulty Level:

Each one of the OpenSignals outputted file formats has some metadata linked to it.

Acquisition parameters such as sampling rate, resolution, used channels... can be found in the file metadata. For the .txt file this info is stored in the header while in .h5 files is passed as attributes of the hierarchical objects.

In the current Jupyter Notebook a detailed procedure for accessing file metadata (.txt and .h5) is explained, together with a simplified approach through the use of a biosignalsnotebooks specialized function.

0 - Importation of the needed packages

# biosignalsnotebooks Python package with useful functions that support and complement 
# the available Notebooks
import biosignalsnotebooks as bsnb

# Package dedicated to process Python abstract syntax grammar Abstract 
# (Abstract Syntax Trees)
from ast import literal_eval

# Package used for accessing .h5 file contents
from h5py import File

1A - Working with a .txt file (generate a dictionary that contains the header info)

1.1A - Open file (creation of a "file" object)

# Specification of the file path 
# (in our case is a relative file path but an absolute one can be specified)
relative_file_path = "/signal_samples/bvp_sample.txt"

# Open file
file_txt = open(relative_file_path, "r")

# Embedding of .pdf file
from IPython.display import IFrame
IFrame(src="/signal_samples/bvp_sample.txt", width="100%", height="350")

As can be seen, in the previous IFrame and in the following figure, the metadata of our .txt file is placed in line 2.

1.2A - Read line 2 of the opened file through "readlines" method of "file_txt" object

# The "readlines" method returns a list where each entry contains a line of the .txt file
txt_data = file_txt.readlines()

# We only need the content of line 2 (entry 1 of "txt_data" list)
metadata = txt_data[1]

# Close file
file_txt.close()

1.3A - Conversion of line content into a dictionary

1.3.1A - Removal of "#" symbol at the beginning of the string, the space between "#" and "{" and the new line command "\n" at the end of the string

Essentially we want to exclude the first two entries of the string (0 and 1) and the last one (-1).

metadata_aux = metadata[2:-1]

print("\033[1mSplit String: \033[0m\n" + metadata_aux)

Split String: 
{"00:07:80:3B:46:61": {"sensor": ["BVP"], "device name": "00:07:80:3B:46:61", "column": ["nSeq", "DI", "CH1"], "sync interval": 2, "time": "9:33:55.606", "comments": "", "device connection": "BTH00:07:80:3B:46:61", "channels": [1], "date": "2017-1-17", "mode": 0, "digital IO": [0, 1], "firmware version": 772, "device": "biosignalsplux", "position": 0, "sampling rate": 1000, "label": ["CH1"], "resolution": [16], "special": [{}]}}

1.3.2A - Conversion of the remaining content of the original string to a dictionary format (using ast package)

header_txt = literal_eval(metadata_aux)

from sty import fg, rs
print(str(header_txt).replace("device name", fg(98,195,238) + "\033[1mdevice name\033[0m" + fg.rs).replace("sampling rate", fg(98,195,238) + "\033[1msampling rate\033[0m" + fg.rs).replace("resolution", fg(98,195,238) + "\033[1mresolution\033[0m" + fg.rs))

{'00:07:80:3B:46:61': {'sensor': ['BVP'], 'device name': '00:07:80:3B:46:61', 'column': ['nSeq', 'DI', 'CH1'], 'sync interval': 2, 'time': '9:33:55.606', 'comments': '', 'device connection': 'BTH00:07:80:3B:46:61', 'channels': [1], 'date': '2017-1-17', 'mode': 0, 'digital IO': [0, 1], 'firmware version': 772, 'device': 'biosignalsplux', 'position': 0, 'sampling rate': 1000, 'label': ['CH1'], 'resolution': [16], 'special': [{}]}}

1B - Working with a .h5 file (generate a dictionary that contains the header info)

1.1B - Load of .h5 file through the creation of a h5py object

# Specification of the file path 
# (in our case is a relative file path but an absolute one can be specified)
relative_file_path = "/signal_samples/ecg_20_sec_1000_Hz.h5"

# Creation of h5py object
file_h5 = File(relative_file_path)

1.2B - Determination of the list of available keys (one per device)

available_keys = list(file_h5.keys())

print("Number of available devices: " + str(len(available_keys)) + "\nAvailable devices: " + str(available_keys))

Number of available devices: 1
Available devices: ['00:07:80:D8:A7:F9']

1.3B - Since we are working only with one device, our mac address is stored in the first entry (index 0) of available_keys list

mac = available_keys[0]

1.4B - Access to the first level of file hierarchy

group_lv1 = file_h5.get(mac)

print(group_lv1)

<HDF5 group "/00:07:80:D8:A7:F9" (5 members)>

1.5B - Request of the metadata linked to the current hierarchy level

attrs_lv1 = group_lv1.attrs.items()

print(attrs_lv1)

ItemsViewHDF5(<Attributes of HDF5 object at 453337904>)

1.6B - Final conversion into a dictionary

header_h5 = dict(attrs_lv1)

from sty import fg, rs
print(str(header_h5).replace("device name", fg(98,195,238) + "\033[1mdevice name\033[0m" + fg.rs).replace("sampling rate", fg(98,195,238) + "\033[1msampling rate\033[0m" + fg.rs).replace("resolution", fg(98,195,238) + "\033[1mresolution\033[0m" + fg.rs))

{'channels': array([1]), 'comments': '', 'date': b'2018-9-28', 'device': 'biosignalsplux', 'device connection': b'BTH00:07:80:D8:A7:F9', 'device name': b'00:07:80:D8:A7:F9', 'digital IO': array([0, 1]), 'duration': b'20s', 'firmware version': 773, 'keywords': b'', 'macaddress': b'00:07:80:D8:A7:F9', 'mode': 0, 'nsamples': 20400, 'resolution': array([16]), 'sampling rate': 1000, 'sync interval': 2, 'time': b'14:39:43.518'}

By a simple "key" call it will be possible to access the values highlighted in blue , at topics 1.3.3A and 1.6B.

# Sampling rate of the acquisition contained inside .txt file
sr_txt = header_txt["00:07:80:3B:46:61"]["sampling rate"]
# Sampling rate of the acquisition contained inside .h5 file
sr_h5 = header_h5["sampling rate"]

print("\033[1mSampling Rate (.txt):\033[0m " + str(sr_txt) + " Hz")
print("\033[1mSampling Rate (.h5):\033[0m " + str(sr_h5) + " Hz")

Sampling Rate (.txt): 1000 Hz
Sampling Rate (.h5): 1000 Hz

The previously described procedures, for accessing metadata from .txt and .h5 files, can be easily achieved while loading a file with bsnb.load function, by specifying the input argument called "get_header" as True

# Specification of the file path (in our case is a relative file path but an absolute one can be specified)
relative_file_path = "/signal_samples/ecg_20_sec_1000_Hz.h5"

data, header = bsnb.load(relative_file_path, get_header=True)

from sty import fg, rs
print(str(header).replace("device name", fg(98,195,238) + "\033[1mdevice name\033[0m" + fg.rs).replace("sampling rate", fg(98,195,238) + "\033[1msampling rate\033[0m" + fg.rs).replace("resolution", fg(98,195,238) + "\033[1mresolution\033[0m" + fg.rs))

{'channels': array([1]), 'comments': '', 'date': b'2018-9-28', 'device': 'biosignalsplux', 'device connection': b'BTH00:07:80:D8:A7:F9', 'device name': b'00:07:80:D8:A7:F9', 'digital IO': array([0, 1]), 'firmware version': 773, 'resolution': array([16]), 'sampling rate': 1000, 'sync interval': 2, 'time': b'14:39:43.518', 'sensor': [b'ECG'], 'column labels': {1: 'channel_1'}}

As can be understood, for a correct and efficient processing of OpenSignals files it is mandatory to access the information contained inside the header (.txt files) or stored as attributes (.h5 files), like the acquisition sampling rate or device mac-address.

This info ( metadata ) is always attached to the "real data" as an essential complement.

We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don"t stop your journey and learn more with the remaining Notebooks !

☌ Project Presentation
☌ GitHub Repository
☌ How to install biosignalsnotebooks Python package ?
☌ Signal Library

☌ Notebook Categories
☌ Notebooks by Difficulty
☌ Notebooks by Signal Type
☌ Notebooks by Tag

from biosignalsnotebooks.__notebook_support__ import css_style_apply
css_style_apply()

.................... CSS Style Applied to Jupyter Notebook .........................