Skip to content

data.h5py.H5SupSaver

Class · Context · Source

saver = mdnc.data.h5py.H5SupSaver(
    file_name=None, enable_read=False
)

Save supervised data set as .h5 file. This class allows users to dump multiple datasets into one file handle, then it would save it as a .h5 file. The keywords of the sets should be assigned by users. It supports both the context management and the dictionary-style nesting. It is built on top of h5py.Group and h5py.Dataset.

The motivation of using this saver includes:

  • Provide an easier way for saving resizable datasets. All datasets created by this saver are resizable.
  • Provide convenient APIs for creating h5py.Softlink, h5py.Attributes and h5py.VirtualDataSet.
  • Add context nesting supports for h5py.Group. This would makes the codes more elegant.

Arguments

Requries

Argument Type Description
file_name str A path where we save the file. If not set, the saver would not open a file.
enable_read bool When setting True, enable the a mode. Otherwise, use w mode. This option is used when adding data to an existed file.

Methods

config

saver.config(logver=0, **kwargs)

Make configuration for the saver. Only the explicitly given argument would be used for changing the configuration of this instance.

Requries

Argument Type Description
logver int The verbose level of the outputs. When setting 0, would run silently.
**kwargs Any argument that would be used for creating h5py.Dataset. The given argument would override the default value during the dataset creation.

get_config

cfg = saver.get_config(name=None)

Get the current configuration value by the given name.

Requries

Argument Type Description
name str The name of the required config value.

Returns

Argument Description
cfg The required config value.

open

saver.open(file_name, enable_read=None)

Open a new file. If a file has been opened before, this file would be closed. This method and the __init__ method (need to specify file_name) support context management.

Requries

Argument Type Description
file_name str A path where we save the file.
enable_read bool When setting True, enable the a mode. Otherwise, use w mode. This option is used when adding data to an existed file. If not set, the enable_read would be inherited from the class definition. Otherwise, the class definition enable_read would be updated by this new value.

close

saver.close()

Close the saver.


dump

saver.dump(keyword, data, **kwargs)

Dump the dataset with a keyword into the file. The dataset is resizable, so this method could be used repeatly. The data would be always attached at the end of the current dataset.

Requries

Argument Type Description
file_name str The keyword of the dumped dataset.
data np.ndarray A new batch of data items, should be a numpy array. The axes data[1:] should match the shape of existing dataset.
**kwargs Any argument that would be used for creating h5py.Dataset. The given argument would override the default value and configs set by config() during the dataset creation.

saver.set_link(keyword, target, overwrite=True)

Create a h5py.Softlink.

Requries

Argument Type Description
keyword str The keyword of the to-be created soft link.
target str The reference (pointting position) of the soft link.
overwrite bool if not True, would skip this step when the the keyword exists. Otherwise, the keyword would be overwritten, even if it contains an h5py.Dataset.

set_attrs

saver.set_attrs(keyword, attrs=None, **kwargs)

Set attrs for an existed data group or dataset.

Requries

Argument Type Description
keyword str The keyword where we set the attributes.
attrs dict The attributes those would be set.
**kwargs More attributes those would be combined with attrs by dict.update().

set_virtual_set

saver.set_virtual_set(keyword, sub_set_keys, fill_value=0.0)

Create a virtual dataset based on a list of subsets. All subsets require to be h5py.Dataset and need to share the same shape (excepting the first dimension, i.e. the sample number). The subsets would be concatenated at the axis=1. For example, when d1.shape=[100, 20], d2.shape=[80, 20], the output virtual set would be d.shape=[100, 2, 20]. In this case, d[80:, 1, :] are filled by fill_value.

Requries

Argument Type Description
keyword str The keyword of the dumped dataset.
sub_set_keys (str, ) A sequence of sub-set keywords. Each sub-set should share the same shape (except for the first dimension).
fill_value float The value used for filling the blank area in the virtual dataset.

Properties

attrs

attrs = saver.attrs  # Return the h5py.AttributeManager
saver.attrs = dict(...)  # Use a dictionary to update attrs.

Supports using a dictionary to update the attributes of the current h5py object. The returned attrs is used as h5py.AttributeManager.

Examples

Example 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import os
import numpy as np
import mdnc

root_folder = 'alpha-test'
os.makedirs(root_folder, exist_ok=True)

if __name__ == '__main__':
    # Perform test.
    with mdnc.data.h5py.H5SupSaver(os.path.join(root_folder, 'test_h5supsaver'), enable_read=False) as s:
        s.config(logver=1, shuffle=True, fletcher32=True, compression='gzip')
        s.dump('one', np.ones([25, 20]), chunks=(1, 20))
        s.dump('zero', np.zeros([25, 10]), chunks=(1, 10))
data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Dump one into the file. The data shape is (25, 20).
data.h5py: Dump zero into the file. The data shape is (25, 10).
Example 2
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import os
import numpy as np
import mdnc

root_folder = 'alpha-test'
os.makedirs(root_folder, exist_ok=True)

if __name__ == '__main__':
    # Perform test.
    saver = mdnc.data.h5py.H5SupSaver(enable_read=False)
    saver.config(logver=1, shuffle=True, fletcher32=True, compression='gzip')
    with saver.open(os.path.join(root_folder, 'test_h5supsaver')) as s:
        s.dump('test1', np.zeros([100, 20]))
        gb = s['group1']
        with gb['group2'] as g:
            g.dump('test2', np.zeros([100, 20]))
            g.dump('test2', np.ones([100, 20]))
            g.attrs = {'new': 1}
            g.set_link('test3', '/test1')
        print('data.h5py: Check open: s["group1"]={0}, s["group1/group2"]={1}'.format(gb.is_open, g.is_open))
data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Open a new file: alpha-test\test_h5supsaver.h5
data.h5py: Dump test1 into the file. The data shape is (100, 20).
data.h5py: Dump test2 into the file. The data shape is (100, 20).
data.h5py: Dump 100 data samples into the existed dataset /group1/group2/test2. The data shape is (200, 20) now.
data.h5py: Create a soft link "test3", pointting to "/test1".
data.h5py: Check open: s["group1"]=True, s["group1/group2"]=False

Last update: March 14, 2021

Comments