evefile.boundaries.eveh5 module
Low-level Python object representation of eveH5 file contents.
This module provides a Python representation (in form of a hierarchy of
objects) of the HDF5 contents of an eveH5 file that can be mapped to
the EveFile
interface. Being a
low-level object representation, technically speaking this module is a
resource. The corresponding facade (user-facing interface) would be the
evefile
module.
While the Python h5py package already provides the low-level access and gets used, the eveH5 module contains Python objects that are independent of an open HDF5 file, represent the hierarchy of HDF5 items (groups and datasets), and contain the attributes of each HDF5 item in form of a Python dictionary. Furthermore, each object contains a reference to both, the original HDF5 file and the HDF5 item, thus making reading dataset data (and attributes) on demand as simple as possible.
Another reason for having a separate module rather than directly using the h5py package in different modules: modularity. If the IO framework (h5py for the time being) is to be replaced at some point, there will be only this one place.
Overview
A first overview of the classes implemented in this module and their hierarchy is given in the UML diagram below.
Fig. 9 Class hierarchy of the evefile.boundaries.eveh5
module.
The HDF5Item
class and children represent the individual HDF5
items on a Python level, similarly to the classes provided in the h5py
package, but without requiring an open HDF5 file. Furthermore, reading
actual data (dataset values) is deferred by default.
As such, the HDF5Item
class hierarchy shown above is pretty generic
and should work with all eveH5 versions. However, it is not meant as a
generic HDF5 interface, as it does make some assumptions based on the
eveH5 file structure and format.
Key aspects
Despite being a low-level interface to eveH5 HDF5 files, the eveh5 module provides a series of abstractions and special behaviour summarised below:
Each
HDF5Item
object is independent, carrying the entire information necessary to obtain both, attributes and data.Data (and attributes) are only loaded on demand, speeding up reading a file and saving resources.
HDF5Group
objects are iterable: you can iterate over the items in the group. AsHDF5File
inherits fromHDF5Group
, this is true forHDF5File
objects as well.To speed up reading an HDF5 file, the file is once opened and will be closed only after reading the entire hierarchy of HDF5 items. Similarly, when using the
HDF5File
in other code, closing the HDF file in between can be prevented. In this case, call theHDF5File.close()
method manually once you’re done.
Usage
The typical entry point is an HDF5 file (an eveH5 file, to be precise)
that should be represented as hierarchy of Python objects and the
information contained potentially mapped to another hierarchy of classes.
On the Python level, this is implemented by the HDF5File
class.
Reading an HDF5 (eveH5) file is as simple as:
from evefile.boundaries import eveh5
file = eveh5.HDF5File(filename_="test.h5")
file.read()
Each HDF5 item present in the root group (/
) will appear as attribute by
its name in the HDF5File
object. Each such item is of type
HDF5Item
, more precisely either a HDF5Dataset
or a
HDF5Group
.
If a given HDF5 item is itself a group, i.e. a node
in the tree and hence an HDF5Group
, containing other HDF5 items,
these will be accessible as chained attributes. Suppose your HDF5 file
would contain a dataset /c1/meta/PosCountTimer
. You could access this
dataset by:
file.c1.meta.PosCountTimer
The dataset itself is an HDF5Dataset
object. Datasets are the leafs
of the tree, i.e. they cannot contain other items. But they can (and
usually do) contain data.
Furthermore, you can iterate over the items as with every Python iterable:
for item in file:
print(item.name)
This would reveal the name of each item present in the root group, be it a
group or a dataset. Of course, iterating works with every HDF5Group
object, not only the HDF5File
object. Getting the names of all
items of the c1
group translates to:
for item in file.c1:
print(item.name)
In both cases, the item names contain the full path within the HDF5 file,
with hierarchy levels separated by /
. If you are only interested in the
plain names without path or any leading slashes, use the dedicated method:
item_names = file.item_names()
This returns a list of the item names without path or any slahes.
Note that upon reading the file, neither attributes nor dataset values are loaded from the HDF5 file. Reading attributes, here for the root group of the file – but identical for each HDF5 item –, is as simple as:
file.get_attributes()
Afterwards, the attributes are available as HDF5Item.attributes
attribute, i.e. a Python dict
with the keys corresponding to
the HDF5 attribute names and the values transformed into scalar strings.
In a similar fashion, if you want to access the data of a dataset,
you need to once ask for the data to be loaded from the HDF5 file. For the
/c1/meta/PosCountTimer
dataset mentioned above, this would translate to:
file.c1.meta.PosCountTimer.get_data()
Afterwards, you can access the data as numpy.ndarray
via the
HDF5Dataset.data
attribute.
Classes
The following classes are implemented in the module:
-
Base class for HDF5 items.
Provides the filename, name of the item, and attributes, as well as mechanisms to load the attributes on demand from the original HDF5 file.
-
Representation of an HDF5 dataset.
Datasets are the leafs of the hierarchical tree. They cannot contain other
HDF5Item
objects, but they contain data. Provides mechanisms to load the data on demand from the original HDF5 file. -
Representation of an HDF5 group containing other HDF5 items.
Groups are the nodes of the hierarchical tree. They can (and usually will) contain other
HDF5Item
objects, but the cannot contain data. -
Representation of an HDF5 file containing other HDF5 items.
A special
HDF5Group
instance containing all items contained in an HDF5 (eveH5) file as hierarchy ofHDF5Item
objects. Provides mechanisms to load the entire contents of an HDF5 file.
Module documentation
- class evefile.boundaries.eveh5.HDF5Item(filename='', name='')
Bases:
object
Base class for HDF5 items.
HDF5 files are hierarchically structured and contain groups and datasets, with a dataset always belonging to a group (and be it the root group). Both, groups and datasets can have attributes.
This class provides the basic structure for both types of HDF5 items and is subclassed accordingly.
- name
Name of the node/item within the HDF5 file
Note that this is the full name, including its “path” through the hierarchy of the HDF5 file.
- Type:
- attributes
Attributes of the HDF5 item
Attributes are not loaded by default, but need to be set calling
get_attributes()
once.The attribute values are converted to (unicode) strings.
- Type:
- Parameters:
- Raises:
ValueError – Raised if either filename or name are not provided and attributes are accessed.
Examples
Usually, you will not directly instantiate or access an
HDF5Item
object, but one of its subclasses. Nevertheless, it is perfectly possible:item = HDF5Item()
Both, filename and name can be set upon instantiation:
item = HDF5Item(filename="test.h5", name="/")
Here
/
refers to the HDF5 root group that is always present.To set the attributes, i.e. read them from the HDF5 file, use the
get_attributes()
method:item = HDF5Item(filename="test.h5", name="/") item.get_attributes()
Note that this will raise if either filename or name are not provided.
The idea behind obtaining the attributes: being independent of the HDF5 file. By directly using the h5py package, the file would always need to be open to access the attributes.
- get_attributes()
Get attributes from HDF5 item.
The
attributes
attribute is set accordingly, and the attribute values are converted into (unicode) strings. Note that this should be valid for all eveH5 datasets and groups, but not generally for HDF5 items.Note
As in reality, sometimes other character sets than UTF-8 or ASCII have been used, most probably ISO8859-1, conversion will check for a
UnicodeDecodeError
and tryiso8859
as encoding in this case. Note, however, that technically speaking, eveH5 files with such character encoding are no valid HDF5 files.- Raises:
ValueError – Raised if either filename or name are not provided and attributes are accessed.
- class evefile.boundaries.eveh5.HDF5Dataset(filename='', name='')
Bases:
HDF5Item
Representation of an HDF5 dataset.
In HDF5, a dataset is the “leaf” of the tree, i.e. it does not have any children, only a parent (group). Datasets have both, data and attributes. Data can be of arbitrary type, not only numeric, and are represented as
numpy.array
.- Raises:
ValueError – Raised if either filename or name are not provided and data are obtained from HDF5 file.
Examples
HDF5 datasets are HDF5 items with data. Hence, instantiating the class is identical to instantiating an
HDFItem
object:dataset = HDF5Dataset()
Both, filename and name can be set upon instantiation:
dataset = HDF5Dataset(filename="test.h5", name="/test")
Here
/test
refers to a hypothetical HDF5 datasettest
present in the root group/
of the HDF5 file.To set the attributes, i.e. read them from the HDF5 file, use the
get_attributes()
method:dataset = HDF5Dataset(filename="test.h5", name="/test") dataset.get_attributes()
Note that this will raise if either filename or name are not provided.
To set the data, i.e. read the data of the dataset from the HDF5 file, use the
get_data()
method:dataset = HDF5Dataset(filename="test.h5", name="/test") dataset.get_data()
Several subsequent calls to the
get_data()
method will not read the data from the HDF5 file more than once for efficiency purposes.The idea behind obtaining the attributes and data this way: being independent of the HDF5 file. By directly using the h5py package, the file would always need to be open to access the attributes.
- property data
Data of the HDF5 dataset.
Can be numeric, but generally of any type numpy supports.
Note
Data are only loaded on demand, i.e. first access, for performance reasons. However, accessing the
data
property automatically triggers a load as long as no data have been set before.- Returns:
data – Data of the HDF5 dataset
- Return type:
- property dtype
Data type (NumPy dtype) of the dataset.
The dtype of a given dataset can be read without reading the data, should hence be a cheap operation. Knowing the dtype, however, is sometimes important to know how to further process the given HDF5 dataset.
- Returns:
dtype – NumPy dtype object of the dataset
- Return type:
- property shape
Shape of the dataset.
The shape of a given dataset can be read without reading the data, should hence be a cheap operation. Knowing the shape, however, is sometimes important to know how to further process the given HDF5 dataset.
- Returns:
shape – Shape of the dataset
- Return type:
- get_data()
Get data from HDF5 dataset.
The
data
attribute is set accordingly. Note that getting data means opening the actual HDF5 file. Hence, if thedata
attribute has nonzero length, no data are read from the HDF5 file, as it is silently assumed that a read process took place beforehand. This may be relevant particularly for larger datasets.- Raises:
ValueError – Raised if either filename or name are not provided and attributes are accessed.
- get_attributes()
Get attributes from HDF5 item.
The
attributes
attribute is set accordingly, and the attribute values are converted into (unicode) strings. Note that this should be valid for all eveH5 datasets and groups, but not generally for HDF5 items.Note
As in reality, sometimes other character sets than UTF-8 or ASCII have been used, most probably ISO8859-1, conversion will check for a
UnicodeDecodeError
and tryiso8859
as encoding in this case. Note, however, that technically speaking, eveH5 files with such character encoding are no valid HDF5 files.- Raises:
ValueError – Raised if either filename or name are not provided and attributes are accessed.
- class evefile.boundaries.eveh5.HDF5Group(filename='', name='')
Bases:
HDF5Item
Representation of an HDF5 group containing other HDF5 items.
HDF5 is a hierarchical data format (hence the name), i.e. a tree consisting of groups as nodes and datasets as leafs. Datasets are represented by the
HDF5Dataset
class, groups by this class. Groups can contain other groups as well as datasets, and groups can have attributes, as any other HDF5 item. An HDF5 file is basically a group, and at the same time the root node. For representing entire HDF5 files, there is a special classHDF5File
providing convenience methods to read files and convert them into a hierarchy ofHDF5Item
objects.The class is implemented as iterable, i.e. you can iterate over all items (
HDF5Item
objects) as you would with every other Python iterable. See the examples section for further details.- item
Item of a group
All items added to a collection using the method
add_item()
will appear as attribute of the object. As this is dynamic, however, no concrete attributes can be described here.- Type:
Examples
HDF5 groups are HDF5 items acting as nodes, i.e. items that can contain other items. Hence, instantiating the class is identical to instantiating an
HDFItem
object:group = HDF5Group()
Both, filename and name can be set upon instantiation:
group = HDF5Group(filename="test.h5", name="/test")
Here
/test
refers to a hypothetical HDF5 grouptest
present in the root group/
of the HDF5 file.To set the attributes, i.e. read them from the HDF5 file, use the
get_attributes()
method:group = HDF5Group(filename="test.h5", name="/test") group.get_attributes()
Note that this will raise if either filename or name are not provided.
To add an item to the group, you first need to have an item, and only afterwards you can add it to the group:
dataset = HDF5Dataset(filename="test.h5", name="/test/foo") group = HDF5Group(filename="test.h5", name="/test") group.add_item(dataset)
Items of a group are set as attributes in the object, with their name corresponding to the last part of the item name (after the last slash). Hence, you can access these attributes as any other attribute:
dataset = HDF5Dataset(filename="test.h5", name="/test/foo") group = HDF5Group(filename="test.h5", name="/test") group.add_item(dataset) item = group.test
Here,
item
would contain the dataset added before to the group.Furthermore, groups are iterable, allowing for a very pythonic way of accessing the items of a group:
dataset = HDF5Dataset(filename="test.h5", name="/test/foo") subgroup = HDF5Group(filename="test.h5", name="/test/bar") group = HDF5Group(filename="test.h5", name="/test") group.add_item(dataset) group.add_item(subgroup) for item in group: print(item.name)
This would return the names (here:
/test/foo
and/test/bar
) of the items of the group.Similarly, you can use list comprehensions to act upon all items of a group. Getting a list of the names, rather than printing them, would be:
[item for item in group]
Note, however, that in both cases, you get the names with the full path within the HDF5 file. If you were interested in just the names, there is a convenience method for you:
group.item_names()
In the same scenario as above, this would return a list with the names
foo
andbar
, without path and leading slash.- add_item(item)
Add an item to the group.
- Parameters:
item (
HDF5Item
) –The item to be added to the group.
The item will be accessible as attribute of the group, using the last part of the name of the item in
HDF5Item.name
as name for the attribute.
- item_names()
Names of the items in the group.
Of course, you could simply iterate over the object and access the
name
attribute of each item in a list comprehension and get the same. However, due to the internal implementation of the iterator, this convenience method should be much faster, besides being convenient. Furthermore, thename
attribute always contains the entire path within the HDF5 file, while the names returned here are just the names, without path and leading/
.- Returns:
item_names – Names of the items in the group
- Return type:
- get_attributes()
Get attributes from HDF5 item.
The
attributes
attribute is set accordingly, and the attribute values are converted into (unicode) strings. Note that this should be valid for all eveH5 datasets and groups, but not generally for HDF5 items.Note
As in reality, sometimes other character sets than UTF-8 or ASCII have been used, most probably ISO8859-1, conversion will check for a
UnicodeDecodeError
and tryiso8859
as encoding in this case. Note, however, that technically speaking, eveH5 files with such character encoding are no valid HDF5 files.- Raises:
ValueError – Raised if either filename or name are not provided and attributes are accessed.
- class evefile.boundaries.eveh5.HDF5File
Bases:
HDF5Group
Representation of an HDF5 file containing other HDF5 items.
Technically speaking, an HDF5 file is nothing else than a
HDF5Group
object of the root group (/
). However, the class provides convenience methods for reading an HDF5 file and converting it into a hierarchical structure ofHDF5Item
objects.- read_attributes
Whether to automatically read the attributes.
Sometimes, it is convenient to automatically load the attributes of each HDF5 item when importing an HDF5 file.
Default: False
- Type:
- close_file
Whether to close the HDF5 file after read.
For performance reasons, particularly in context of the
EveFile
class, it may be sensible to leave the HDF5 file open and close it only afterwards.In such case, you are responsible for closing the file yourself. Use the
close()
method for convenience.- Type:
- Raises:
ValueError – Raised if filename is not provided and data are obtained from HDF5 file.
Examples
HDF5 files are basically HDF5 groups serving as root group and containing typically a hierarchy of other HDF5 items (both, groups and datasets). The same holds true for the
HDF5File
class serving as root group and containing each and every HDF5 item contained in the file read.To read an HDF5 file and create a hierarchy of
HDF5Item
objects corresponding to the HDF5 items in the file, instantiate the object and call itsread()
method:file = HDF5File() file.read("test.h5")
This will read the file
test.h5
and create the corresponding hierarchy ofHDF5Group
andHDF5Dataset
items. Note that neither attributes nor data (in case of datasets) are read. This can (and needs to) be done manually afterwards for eachHDF5Item
object.Instead of providing the HDF5 file name as a parameter to the
read()
method, you can set it beforehand in theHDF5File
object, as usual even during instantiation of the object:file = HDF5File(filename="test.h5") file.read()
Note that the
name
attribute of theHDF5File
object will automatically be set to/
to reflect the root node.If you would like to read the attributes for each HDF5 item along with the item itself, tell the
HDF5File
object to do so:file = HDF5File(filename="test.h5") file.read_attributes = True file.read()
This will not only add the items to the
HDF5File
object, but at the same time read and add their attributes.- read(filename='')
Read contents of an HDF5 (eveH5) file and create hierarchy of items.
The hierarchical structure of the HDF5 file is represented as hierarchy of
HDF5Item
objects, namelyHDF5Group
for groups (nodes) andHDF5Dataset
for datasets (leafs).Note
Only the corresponding items will be created, but neither their attributes nor data read.
- Parameters:
filename (
str
) –Name of the HDF5 (eveH5) file to read.
If not provided, but set as attribute
filename
, the latter will be used. Takes precedence of the attributefilename
.- Raises:
ValueError – Raised if filename is not provided and data are obtained from HDF5 file.
- close()
Close open HDF5 file.
For performance reasons, particularly in context of the
EveFile
class, it may be sensible to leave the HDF5 file open and close it only afterwards.In such case, you are responsible for closing the file yourself. Use this method for convenience.
- add_item(item)
Add an item to the group.
- Parameters:
item (
HDF5Item
) –The item to be added to the group.
The item will be accessible as attribute of the group, using the last part of the name of the item in
HDF5Item.name
as name for the attribute.
- get_attributes()
Get attributes from HDF5 item.
The
attributes
attribute is set accordingly, and the attribute values are converted into (unicode) strings. Note that this should be valid for all eveH5 datasets and groups, but not generally for HDF5 items.Note
As in reality, sometimes other character sets than UTF-8 or ASCII have been used, most probably ISO8859-1, conversion will check for a
UnicodeDecodeError
and tryiso8859
as encoding in this case. Note, however, that technically speaking, eveH5 files with such character encoding are no valid HDF5 files.- Raises:
ValueError – Raised if either filename or name are not provided and attributes are accessed.
- item_names()
Names of the items in the group.
Of course, you could simply iterate over the object and access the
name
attribute of each item in a list comprehension and get the same. However, due to the internal implementation of the iterator, this convenience method should be much faster, besides being convenient. Furthermore, thename
attribute always contains the entire path within the HDF5 file, while the names returned here are just the names, without path and leading/
.- Returns:
item_names – Names of the items in the group
- Return type: