Use cases

What does working with the evefile package feel and look like?

This page provides just some first ideas of how working with the evefile package may look like. Further examples will be detailed elsewhere. Nevertheless, for the time being, it serves as high-level user documentation.

Important

Potential users and contributors to these use cases should be clear about the scope of the evefile package. It is not meant to do any data processing and analysis, but rather provide the main interface between the the information obtained from a measurement and the actual data display and data processing and analysis. For these tasks, dedicated packages, namely radiometry and evedataviewer, are being developed.

General usage 

Before being able to work with the evefile package, you need to have it installed (in your local Python virtual environment):

pip install evefile

See Installation for further details. Once it is installed, you can import it in your code, as any other Python package:

import evefile

Having this done, you have direct access to the EveFile class that serves as the main user-facing interface of the entire package.

Note

From here on, we assume you to have imported the evefile package as shown above. All further sections on this page require you to have done this step.

Loading an eveH5 data file 

Suppose you have data measured and contained in an eveH5 file named my_measurement_file.h5. Loading the contents of a this file is as simple as:

my_file = evefile.EveFile(filename="my_measurement_file.h5")

Here, my_file contains all the information contained in the data file as a hierarchy of Python objects. For more details, see the documentation of the evefile module and below.

Basic information on the file loaded 

Having loaded a data file is fine, but how to quickly check whether you have chosen the correct file, and get an overview of what is contained in this file? Simply call the show_info() method of the respective object:

my_file = evefile.EveFile(filename="my_measurement_file.h5")
my_file.show_info()

This will output something similar to the following:

METADATA
                       filename: file.h5
                  eveh5_version: 7
                    eve_version: 2.0
                    xml_version: 9.2
            measurement_station: Unittest
                          start: 2024-06-03 12:01:32
                            end: 2024-06-03 12:01:37
                    description:
                     simulation: False
                 preferred_axis: SimMot:01
              preferred_channel: SimChan:01
preferred_normalisation_channel: SimChan:01

LOG MESSAGES
20250812T09:06:05: Lorem ipsum

DATA
foo (SimMot:01) <AxisData>
bar (SimChan:01) <SinglePointChannelData>

SNAPSHOTS
bar (SimChan:01) <AxisData>
bazfoo (SimChan:03) <AxisData>
foo (SimMot:01) <AxisData>

MONITORS

Of course, this output contains test data and test names, hence your output of an actual measurement file will show more sensible names. For further explanation, see the documentation of the show_info() method.

Accessing individual file metadata

You’ve seen the file metadata in the METADATA block in the output of the show_info() method above. If you would want to access (programmatically) any of the metadata fields, this is of course possible as well:

my_file.metadata.eveh5_version

This would return the eveH5 version (as a string). How to know which metadata are available? Technically, the metadata are stored as fields of a Metadata object. Hence, have a look at its documentation to see what fields are available and what their meanings are.

Accessing data 

Setting aside concepts such as metadata, snapshots, and monitors, motor axes that have been moved during a measurement and detector channels for which values have been recorded represent the primary data of any measurement contained in an eveH5 file. Of course, there is more complicated devices, such as multi-channel analysers (MCA) and cameras, but most often we deal with 1D arrays (vectors) of data.

Note

One key concept of the evefile package is to load data only on demand, not already when loading the eveH5 file. This speeds up things, and often you are not interested in all the data contained in the eveH5 file, but only on some distinct datasets (in HDF5 language), i.e. certain motor axes and detector channels.

Each device (motor axis, detector channel) is represented as a dataset in the eveH5 file, and correspondingly as an instance of the Data class, to be exact an instance of one of its subclasses, in evefile. The EveFile object created upon loading an eveH5 file has a data attribute (a dict) with the unique IDs (rather than the “given” names) of the datasets as key and the Data object as value.

How to get an overview of all the available datasets within the eveH5 file you’ve just loaded? There are two possibilities: Either you use the EveFile.show_info() method shown above, or you ask the EveFile object for the data contained therein:

my_file.get_data_names()

This will return a list of “given” names.

If you know the “given” name of a dataset of interest, you can directly ask for it, using the EveFile.get_data() method:

current = my_file.get_data("Ring_1")

This would return the same dataset you could get by directly accessing the field of the EveFile.data attribute using the corresponding ID as key:

current = my_file.data["bIICurrent:Mnt1chan1"]

If you have a look at the documentation of the EveFile.get_data() method, you may realise that this method allows you to provide a list of names rather than a single name only. In this case, the return value will no longer be a single Data object, but a list of Data objects:

[axis, current] = my_file.get_data(["Sim_Motor1", "Ring_1"])

Note that in any case, the resulting data are objects of class Data, and in this particular case of classes AxisData and SinglePointChannelData, respectively. Why this? Because every dataset comes not only with (mostly numerical) data, but corresponding metadata as well. And data without metadata are useless. So what now? How to get more information on the individual data(sets) you’ve just extracted from the loaded eveH5 file? Carry on reading…

Getting preferred data

One concept of the eve measurement program is to (optionally) define a preferred axis and channel, and additionally a preferred normalisation channel. You can easily find out using the show_info() method of an EveFile object whether these values are set in the metadata.

If they are set, there is a convenient shortcut to just access these three datasets:

[pref_axis, pref_channel, pref_norm] = my_file.get_preferred_data()

If any of the three is missing, the corresponding value will be of type None.

Getting information on a dataset 

Suppose you had loaded a file measurement.h5 and extracted two datasets named “Sim_Motor1” and “Ring_1” as follows:

my_file = evefile.EveFile(filename="measurement.h5")
[axis, current] = my_file.get_data(["Sim_Motor1", "Ring_1"])

Now you have two datasets available, with the variable names axis and current. To get more information on either of them, use their show_info() method:

axis.show_info()

This would result in an output similar to the following:

METADATA
       name: Sim_Motor1
       unit: degrees
         id: SimMt:testrack01000
         pv: SimMt:testrack01000
access_mode: ca
   deadband: 0.0

FIELDS
data
position_counts
set_values

The same you could do for the channel (the ring current):

current.show_info()

This would result in an output similar to the following:

METADATA
       name: Ring_1
       unit: mA
         id: bIICurrent:Mnt1chan1
         pv: bIICurrent:Mnt1.VAL
access_mode: ca

FIELDS
data
position_counts

What does all this tell you? Well: Every Data object has metadata that are represented in the block METADATA above with their fields and field contents. Furthermore, it has a series of fields, usually position_counts and data, with the latter containing the actual data and the former the position counts (the main quantisation axis of all the data of a scan). For how to access the metadata and data, keep reading.

Accessing (meta)data of a dataset 

Suppose you had loaded a file measurement.h5 and extracted two datasets named “Sim_Motor1” and “Ring_1” as before:

my_file = evefile.EveFile(filename="measurement.h5")
[axis, current] = my_file.get_data(["Sim_Motor1", "Ring_1"])

Now you have two datasets available, with the variable names axis and current. Every dataset is an instance of the (subclass of the) class Data, with metadata and data.

Metadata

Access each of the metadata fields as follows, as the metadata are an object of (a subclass of) class Metadata:

axis.metadata.unit

This would, for example, give you the unit (as string) corresponding to the axis values – quite helpful for automatically creating axis labels, for example.

Data

Every dataset contains data, often numeric data in form of a 1D array (vector), and all datasets except monitors position counts as reference for the individual data entries.

Hence, to get access directly to the data, simply access the field (attribute) named data:

axis.data

This would return an array (numpy.ndarray) with the data.

Important

While it may seem convenient to store the (numerical) data of a dataset in a separate variable, always keep the context of the Data object, as otherwise, you will loose all the metadata. Remember: Data without metadata are useless.

Joining (aka “filling”) data 

For each motor axis and detector channel, in the original eveH5 file only those values appear—together with a “position” (PosCount) value—that have actually been set or measured. Hence, the number of values (i.e., the length of the data vector) will generally be different for different devices. To be able to plot arbitrary data against each other, the corresponding data vectors need to be commensurate. If this is not the case, they need to be brought to the same dimensions (i.e., “joined”, originally somewhat misleadingly termed “filled”).

To be exact, being commensurate is only a necessary, but not a sufficient criterion, as not only the shape needs to be commensurate, but the indices (in this case the positions) be identical.

For further details and background on joining, see the documentation of the joining module. And be aware that joining is far from being a trivial concept.

Without further ado, if you know the names (or alternatively the IDs) of the datasets in your eveH5 file that you need to be joined, use the method EveFile.get_joined_data() and provide both, the list of names of the data(sets) and (optionally) the join mode:

[axis, current, lifetime] = my_file.get_joined_data(
    data=["Sim_Motor1", "Ring_1", "Lebensdauer_1"],
    mode="AxisOrChannelPositions"
)

The result, as you can see here, will be as many datasets with joined data as you asked for. Each of these datasets is a subclass of MeasureData and a copy of the original data contained in your EveFile object (the beast you access via my_file in the code examples).

Note

There are currently several different join modes implemented, and they have been renamed from the previous “fill modes”. As said above, joining is far from trivial, and everybody using this feature is strongly advised to read the documentation available in the joining module.

Exporting data to a data frame 

Important

While working with a Pandas DataFrame (pandas.DataFrame) may seem convenient, you’re loosing basically all the relevant metadata of the datasets. Remember: Data without metadata are useless. Hence, this method is rather a convenience method to be backwards-compatible to older interfaces, but it is explicitly not suggested for extensive use.

Generally, two scenarios are possible and supported:

Export the data of a given dataset to a data frame.
Export the data of a list of datasets contained in an EveFile object to a data frame.

Both scenarios are described in more detail below.

Export data of a single dataset to a data frame

Every dataset, to be exact every object of type Data, has a method get_dataframe() that returns the data contained in the dataset as pandas.DataFrame.

A more complete example including loading an eveH5 file and retrieving datasets is given below. The key point here is the last line, calling get_dataframe() on the data object:

my_file = evefile.EveFile(filename="measurement.h5")
[axis, current] = my_file.get_data(["Sim_Motor1", "Ring_1"])

axis_df = axis.get_dataframe()

As mentioned above, the data frame will contain mostly the data, but nearly no metadata. For details of how exactly the resulting data frame looks like, consult the get_dataframe() method of the respective subclass of Data, e.g. AxisData.get_dataframe() or SinglePointChannelData.get_dataframe().

Note

Please note that in case of getting a data frame for individual datasets, no joining of data will be performed before exporting the data to a pandas.DataFrame. This is different to the situation described below where you export the data of a list of datasets to a data frame. Furthermore, in contrast to previous eveH5 interfaces, the data frames returned for more complicated channel types, such as NormalizedChannelData, AverageChannelData, and IntervalChannelData, will generally contain less columns, as some of the previously contained columns are scalar metadata that do not change for the individual values. Nevertheless, all these more complicated channel types will contain more than one column for data in the data frame.

Export data of a list of datasets to a data frame

While there may be some use cases for exporting the data of a single dataset to a data frame, probably the more frequent scenario is several datasets from a single eveH5 file that should be exported to a data frame for further handling.

For this purpose, the EveFile object has a get_dataframe() method as well, taking two parameters: data is a list of names (or IDs) of datasets, and mode (optionally) defines how to join data of the individual columns. From that it is already obvious that here, two things happen:

Join the data of the respective datasets.
Export the joined data to a pandas.DataFrame.

Assuming again our scenario from above, where you have loaded an eveH5 file and stored the respective object in the my_file variable, getting a data frame consisting of the data of three datasets and explicitly setting the join mode looks as follows:

df = my_file.get_dataframe(
    data=["Sim_Motor1", "Ring_1", "Lebensdauer_1"],
    mode="AxisOrChannelPositions"
)

As mentioned, previous to creating the data frame, data are joined. Hence, make sure you made yourself familiar with the concept of joining.

Note

There are currently several different join modes implemented, and they have been renamed from the previous “fill modes”. As said above, joining is far from trivial, and everybody using this feature is strongly advised to read the documentation available in the joining module.

Important

Different to previous interfaces, the data frame will only contain one column per dataset, and this column comes directly from the Data.data attribute. Hence, even for more complicated channel types, such as NormalizedChannelData, AverageChannelData, and IntervalChannelData, only one column will exist. If you need to get access to these additional data columns and you still want to use a pandas.DataFrame, use the Data.get_dataframe() method of the individual dataset, as described above.

There is even one special case similar to what has been done in the past using previous interfaces: Getting a data frame containing the data of all datasets contained in an eveH5 file – to be more exact, at least all data from the “main phase” of the scan (not including snapshots or monitors).

Although it is strongly discouraged to use this functionality – among other things because it violates central concepts of the interface – in its most simple (and probably most dangerous) form the call would look like:

almighty_dataframe = my_file.get_dataframe()

What are some of the problems with this approach? Here is an incomplete list:

Loss of all relevant metadata.
No join mode explicitly provided, hence depending on the defaults set in the method (that may change over time).
Despite its name, the data frame is far from “almighty” and lacks relevant information.

Hence, use entirely on your own risk – at best not at all. You have been warned… ;-)

Telemetry (I): Snapshots 

Most people using the eve measurement program are somewhat familiar with the concept of snapshots. Basically, a snapshot does what its name says: recording the current state of a list of devices, be it detector channels or motor axes. The most typical situation in a scan is two “snapshot modules” upstream of any other parts of the scan, one for detector channels and the other for motor axes. Thus, the state of all channels and axes defined in the current measurement station description is recorded. Generally, it may be sensible to record a snapshot for all these devices after the actual scan has been carried out, but this needs to be discussed by those people responsible for designing scan descriptions.

Snapshots serve generally two functions:

Provide base values for axes.

In case of joining data using EveFile.get_joined_data(), for axes, typically the previous values are used for positions no axes values have been recorded. Snapshots are used if available.
Provide telemetry data for the setup the data were recorded with.

Snapshots regularly contain many more parameters than motor axes used and detector channels recorded. Generally, this provides a lot of telemetry data regarding the setup used for recording the data.

The first function is served by the EveFile.get_joined_data() method automatically. The second function can be served by having a look at a summary containing all snapshot data. This is the aim of the method EveFile.get_snapshots(): returning a Pandas DataFrame containing all snapshots as rows and the position counts as columns.

Getting a dataframe containing all the snapshot datasets as rows and the position counts as columns is as simple as:

my_file = evefile.EveFile(filename="measurement.h5")
snapshots = my_file.get_snapshots()

The resulting pandas.DataFrame can be output directly in the Python console, just calling the variable snapshots. The result may look similar to the following:

                             1             2
Sim_Filter_1      b'Undefined'           NaN
Sim_Filter_2      b'Undefined'           NaN
Keithley_196               NaN           NaN
Channel/0                  NaN           0.0
Sim_Motor1                20.0           NaN
SimFilter                  NaN  b'Undefined'
Ring_1                     NaN    296.249928
Lebensdauer_1              NaN      7.756106
TopupState                 NaN      b'decay'
Ring_2                     NaN    296.855205
Lebensdauer_2              NaN      7.053617
mlsRing_1                  NaN       72.9936
mlsLebensdauer_1           NaN     16.343343
mlsRing_2                  NaN           0.0
mlsLebensdauer_2           NaN           0.0
mlsRing_3                  NaN      0.250081
mlsLebensdauer_3           NaN      0.705209

Note that this dataframe only serves as a somewhat convenient overview table of the individual values recorded in the snapshots. There is no point in trying to plot data here, as some of the values are anyway non-numeric – not to mention that for an axis snapshot, the channels have NaN as value and vice versa.

As mentioned in the heading of this use case, think of the snapshots as telemetry data, providing you with an overview of the state of your setup at a given point in time.

There is another type of telemetry data discussed in the next section: monitors. See below for details.

Telemetry (II): Working with monitors 

First a quick introduction into monitors. EPICS knows the concept of a monitor: You attach an observer to a process variable (PV) and get noticed only when the value of the observed PV changes. This concept is used within the eve measurement program as well, and you can set monitors to a large list of PVs defined in your measurement station description from within the eve GUI and eventually a scan description.

From the point of view of the eve engine, the main quantisation axis is the list of position counts – one position reflecting a given state of all motor axes set and all detector axes read. However, position counts only exist for everything under control of the scan engine. Monitors by definition are not under control of the scan engine, but issue their updates independently whenever the value changes. This results in monitor datasets in the measurement files having timestamps (in milliseconds since start of the scan) instead of position counts. To relate the monitor data to the position counts, a mapping needs to be performed. Generally, this is non-trivial and should be different for motor axes and detector channels due to the design of the current scan engine. However, due to the lack of the necessary information stored in the data files, only one kind of mapping can be performed. For details, see the documentation of the timestamp_mapping module.

In any case, before you can sensibly work with monitors, you first need to map their timestamps to position counts. This is handled automatically for you when you use the EveFile.get_monitors() method. However, before we get there, let’s go step by step. First, let’s load a file containing monitor data and get an overview of the file just loaded:

file = evefile.EveFile(filename="monitors.h5")
file.show_info()

The result may look as follows:

METADATA
                       filename: monitors.h5
                  eveh5_version: 7.1
                    eve_version: 2.2.0
                    xml_version: 9.2
            measurement_station: TEST
                          start: 2025-05-15 15:18:10
                            end: 2025-05-15 15:20:10
                    description: testscan containing monitors
                     simulation: False
                 preferred_axis:
              preferred_channel:
preferred_normalisation_channel:

LOG MESSAGES

DATA
Counter (Counter-mot) <AxisData>

SNAPSHOTS

MONITORS
Status (DetP5000:gw2370700.STAT) <MonitorData>
Status (DetbIICurrent:Mnt1topupState.STAT) <MonitorData>
range (K0196:gw23728range) <MonitorData>
Offset (P5000:gw2370700.AOFF) <MonitorData>
Scan (P5000:gw2370700.SCAN) <MonitorData>
range (P5000:gw23707range) <MonitorData>

As you can see already, although monitors have names, these names are not unique. Hence, you can never refer to monitors unequivocally by their (given) name, but only by their ID. This is most probably a design flaw, but that’s not our business right now.

Getting a list of all monitor IDs of the given file is as simple as:

monitor_ids = list(file.monitors.keys())

To obtain an individual dataset of monitor data with their timestamps mapped to position counts, use the EveFile.get_monitors() method:

device_data = file.get_monitors(monitors="DetP5000:gw2370700.STAT")

The resulting dataset is of type DeviceData and can be used, i.a., for joining data using the EveFile.get_joined_data() method:

joined_data = file.get_joined_data(data=[file.get_data("Counter"), device_data])

Note that you are free to mix data names or IDs, monitor IDs, and actual datasets in the list provided as parameter data. However, for monitors, only IDs or actual datasets are allowed, as for monitors, the (given) names are not unique.

In analogy, you can obtain a dataframe with the monitor included:

dataframe = file.get_dataframe(data=[file.get_data("Counter"), device_data])

The same is true for this method: you are free to mix data names or IDs, monitor IDs, and actual datasets in the list provided as parameter data. Again, for monitors, only IDs or actual datasets are allowed, as their (given) names are not unique.

Similar to the EveFile.get_data() method, if you do not provide a parameter monitors, all available monitor datasets will be mapped and the corresponding DeviceData objects returned as a list:

all_device_data = file.get_monitors()

With regard to the EveFile.get_joined_data() and EveFile.get_dataframe() methods, you can automatically include all monitors as well, using the optional parameter include_monitors. In this case, the monitor datasets will first be mapped to DeviceData objects, joined and included in the dataframe:

all_joined_data = file.get_joined_data(include_monitors=True)

This will return a list of joined data, including all data and all mapped monitors. Similarly, for the “all-knowing dataframe” (beware of the many limits of the dataframe approach discussed above):

all_knowing_dataframe = file.get_dataframe(include_monitors=True)

The resulting pandas.DataFrame can be output directly in the Python console, just calling the variable all_knowing_dataframe. The result may look similar to the following:

          Counter DetP5000:gw2370700.STAT DetbIICurrent:Mnt1topupState.STAT  ... P5000:gw2370700.AOFF  P5000:gw2370700.SCAN P5000:gw23707range
position                                                                     ...
             1                 b'HIHI'                       b'NO_ALARM'  ...                  0.0           b'5 second'      b'20 -> V kO'
             2                 b'HIHI'                       b'NO_ALARM'  ...                  0.0           b'5 second'      b'20 -> V kO'
             3                 b'HIHI'                       b'NO_ALARM'  ...                  0.0           b'5 second'      b'20 -> V kO'
             4                 b'HIHI'                       b'NO_ALARM'  ...                  0.0           b'5 second'      b'20 -> V kO'
             5                 b'HIHI'                       b'NO_ALARM'  ...                  0.0           b'5 second'      b'20 -> V kO'
...           ...                     ...                               ...  ...                  ...                   ...                ...
         116                 b'HIHI'                       b'NO_ALARM'  ...                  2.0           b'5 second'      b'20 -> V kO'
         117                 b'HIHI'                       b'NO_ALARM'  ...                  2.0           b'5 second'      b'20 -> V kO'
         118                 b'HIHI'                       b'NO_ALARM'  ...                  2.0           b'5 second'      b'20 -> V kO'
         119                 b'HIHI'                       b'NO_ALARM'  ...                  2.0           b'5 second'      b'20 -> V kO'
         120                 b'HIHI'                       b'NO_ALARM'  ...                  2.0           b'5 second'      b'20 -> V kO'

As you see here, the first column is a “motor axis”, hence with the (given) name as column header, while for the monitors (all the other columns), the IDs are used as column headers. This is, as mentioned, due to the fact that the (given) names of the monitors are not unique (and in the given example, you would end up with two columns labelled “Status” and two columns labelled “range”, not knowing what device they belong to).