data.h5py.H5SupSaver¶

Class · Context · Source

saver = mdnc.data.h5py.H5SupSaver(
    file_name=None, enable_read=False
)

Save supervised data set as .h5 file. This class allows users to dump multiple datasets into one file handle, then it would save it as a .h5 file. The keywords of the sets should be assigned by users. It supports both the context management and the dictionary-style nesting. It is built on top of h5py.Group and h5py.Dataset.

The motivation of using this saver includes:

Provide an easier way for saving resizable datasets. All datasets created by this saver are resizable.
Provide convenient APIs for creating h5py.Softlink, h5py.Attributes and h5py.VirtualDataSet.
Add context nesting supports for h5py.Group. This would makes the codes more elegant.

Arguments¶

Requries

Argument	Type	Description
`file_name`	`str`	A path where we save the file. If not set, the saver would not open a file.
`enable_read`	`bool`	When setting `True`, enable the `a` mode. Otherwise, use `w` mode. This option is used when adding data to an existed file.

Methods¶

`config`¶

saver.config(logver=0, **kwargs)

Make configuration for the saver. Only the explicitly given argument would be used for changing the configuration of this instance.

Requries

Argument	Type	Description
`logver`	`int`	The verbose level of the outputs. When setting 0, would run silently.
`**kwargs`		Any argument that would be used for creating `h5py.Dataset`. The given argument would override the default value during the dataset creation.

`get_config`¶

cfg = saver.get_config(name=None)

Get the current configuration value by the given name.

Requries

Argument	Type	Description
`name`	`str`	The name of the required config value.

Returns

Argument	Description
`cfg`	The required config value.

`open`¶

saver.open(file_name, enable_read=None)

Open a new file. If a file has been opened before, this file would be closed. This method and the __init__ method (need to specify file_name) support context management.

Requries

Argument	Type	Description
`file_name`	`str`	A path where we save the file.
`enable_read`	`bool`	When setting `True`, enable the `a` mode. Otherwise, use `w` mode. This option is used when adding data to an existed file. If not set, the `enable_read` would be inherited from the class definition. Otherwise, the class definition `enable_read` would be updated by this new value.

`close`¶

saver.close()

Close the saver.

`dump`¶

saver.dump(keyword, data, **kwargs)

Dump the dataset with a keyword into the file. The dataset is resizable, so this method could be used repeatly. The data would be always attached at the end of the current dataset.

Requries

Argument	Type	Description
`file_name`	`str`	The keyword of the dumped dataset.
`data`	`np.ndarray`	A new batch of data items, should be a numpy array. The axes `data[1:]` should match the shape of existing dataset.
`**kwargs`		Any argument that would be used for creating `h5py.Dataset`. The given argument would override the default value and configs set by `config()` during the dataset creation.

`set_link`¶

saver.set_link(keyword, target, overwrite=True)

Create a h5py.Softlink.

Requries

Argument	Type	Description
`keyword`	`str`	The keyword of the to-be created soft link.
`target`	`str`	The reference (pointting position) of the soft link.
`overwrite`	`bool`	if not `True`, would skip this step when the the `keyword` exists. Otherwise, the `keyword` would be overwritten, even if it contains an `h5py.Dataset`.

`set_attrs`¶

saver.set_attrs(keyword, attrs=None, **kwargs)

Set attrs for an existed data group or dataset.

Requries

Argument	Type	Description
`keyword`	`str`	The keyword where we set the attributes.
`attrs`	`dict`	The attributes those would be set.
`**kwargs`		More attributes those would be combined with `attrs` by `dict.update()`.

`set_virtual_set`¶

saver.set_virtual_set(keyword, sub_set_keys, fill_value=0.0)

Create a virtual dataset based on a list of subsets. All subsets require to be h5py.Dataset and need to share the same shape (excepting the first dimension, i.e. the sample number). The subsets would be concatenated at the axis=1. For example, when d1.shape=[100, 20], d2.shape=[80, 20], the output virtual set would be d.shape=[100, 2, 20]. In this case, d[80:, 1, :] are filled by fill_value.

Requries

Argument	Type	Description
`keyword`	`str`	The keyword of the dumped dataset.
`sub_set_keys`	`(str, )`	A sequence of sub-set keywords. Each sub-set should share the same shape (except for the first dimension).
`fill_value`	`float`	The value used for filling the blank area in the virtual dataset.

Properties¶

`attrs`¶

attrs = saver.attrs  # Return the h5py.AttributeManager
saver.attrs = dict(...)  # Use a dictionary to update attrs.

Supports using a dictionary to update the attributes of the current h5py object. The returned attrs is used as h5py.AttributeManager.

Examples¶

Example 1

Codes

import os
import numpy as np
import mdnc

root_folder = 'alpha-test'
os.makedirs(root_folder, exist_ok=True)

if __name__ == '__main__':
    # Perform test.
    with mdnc.data.h5py.H5SupSaver(os.path.join(root_folder, 'test_h5supsaver'), enable_read=False) as s:
        s.config(logver=1, shuffle=True, fletcher32=True, compression='gzip')
        s.dump('one', np.ones([25, 20]), chunks=(1, 20))
        s.dump('zero', np.zeros([25, 10]), chunks=(1, 10))

Output

data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Dump one into the file. The data shape is (25, 20).
data.h5py: Dump zero into the file. The data shape is (25, 10).

Example 2

Codes

import os
import numpy as np
import mdnc

root_folder = 'alpha-test'
os.makedirs(root_folder, exist_ok=True)

if __name__ == '__main__':
    # Perform test.
    saver = mdnc.data.h5py.H5SupSaver(enable_read=False)
    saver.config(logver=1, shuffle=True, fletcher32=True, compression='gzip')
    with saver.open(os.path.join(root_folder, 'test_h5supsaver')) as s:
        s.dump('test1', np.zeros([100, 20]))
        gb = s['group1']
        with gb['group2'] as g:
            g.dump('test2', np.zeros([100, 20]))
            g.dump('test2', np.ones([100, 20]))
            g.attrs = {'new': 1}
            g.set_link('test3', '/test1')
        print('data.h5py: Check open: s["group1"]={0}, s["group1/group2"]={1}'.format(gb.is_open, g.is_open))

Output

data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Open a new file: alpha-test\test_h5supsaver.h5
data.h5py: Dump test1 into the file. The data shape is (100, 20).
data.h5py: Dump test2 into the file. The data shape is (100, 20).
data.h5py: Dump 100 data samples into the existed dataset /group1/group2/test2. The data shape is (200, 20) now.
data.h5py: Create a soft link "test3", pointting to "/test1".
data.h5py: Check open: s["group1"]=True, s["group1/group2"]=False

Last update: March 14, 2021

data.h5py.H5SupSaver¶

Arguments¶

Methods¶

config¶

get_config¶

open¶

close¶

dump¶

set_link¶

set_attrs¶

set_virtual_set¶

Properties¶

attrs¶

Examples¶

Comments

`config`¶

`get_config`¶

`open`¶

`close`¶

`dump`¶

`set_link`¶

`set_attrs`¶

`set_virtual_set`¶

`attrs`¶