data.h5py.H5SupSaver¶
Class · Context · Source
saver = mdnc.data.h5py.H5SupSaver(
file_name=None, enable_read=False
)
Save supervised data set as .h5
file. This class allows users to dump multiple datasets into one file handle, then it would save it as a .h5
file. The keywords of the sets should be assigned by users. It supports both the context management and the dictionary-style nesting. It is built on top of h5py.Group
and h5py.Dataset
.
The motivation of using this saver includes:
- Provide an easier way for saving resizable datasets. All datasets created by this saver are resizable.
- Provide convenient APIs for creating
h5py.Softlink
,h5py.Attributes
andh5py.VirtualDataSet
. - Add context nesting supports for
h5py.Group
. This would makes the codes more elegant.
Arguments¶
Requries
Argument | Type | Description |
---|---|---|
file_name | str | A path where we save the file. If not set, the saver would not open a file. |
enable_read | bool | When setting True , enable the a mode. Otherwise, use w mode. This option is used when adding data to an existed file. |
Methods¶
config
¶
saver.config(logver=0, **kwargs)
Make configuration for the saver. Only the explicitly given argument would be used for changing the configuration of this instance.
Requries
Argument | Type | Description |
---|---|---|
logver | int | The verbose level of the outputs. When setting 0, would run silently. |
**kwargs | Any argument that would be used for creating h5py.Dataset . The given argument would override the default value during the dataset creation. |
get_config
¶
cfg = saver.get_config(name=None)
Get the current configuration value by the given name
.
Requries
Argument | Type | Description |
---|---|---|
name | str | The name of the required config value. |
Returns
Argument | Description |
---|---|
cfg | The required config value. |
open
¶
saver.open(file_name, enable_read=None)
Open a new file. If a file has been opened before, this file would be closed. This method and the __init__
method (need to specify file_name
) support context management.
Requries
Argument | Type | Description |
---|---|---|
file_name | str | A path where we save the file. |
enable_read | bool | When setting True , enable the a mode. Otherwise, use w mode. This option is used when adding data to an existed file. If not set, the enable_read would be inherited from the class definition. Otherwise, the class definition enable_read would be updated by this new value. |
close
¶
saver.close()
Close the saver.
dump
¶
saver.dump(keyword, data, **kwargs)
Dump the dataset with a keyword into the file. The dataset is resizable, so this method could be used repeatly. The data would be always attached at the end of the current dataset.
Requries
Argument | Type | Description |
---|---|---|
file_name | str | The keyword of the dumped dataset. |
data | np.ndarray | A new batch of data items, should be a numpy array. The axes data[1:] should match the shape of existing dataset. |
**kwargs | Any argument that would be used for creating h5py.Dataset . The given argument would override the default value and configs set by config() during the dataset creation. |
set_link
¶
saver.set_link(keyword, target, overwrite=True)
Create a h5py.Softlink.
Requries
Argument | Type | Description |
---|---|---|
keyword | str | The keyword of the to-be created soft link. |
target | str | The reference (pointting position) of the soft link. |
overwrite | bool | if not True , would skip this step when the the keyword exists. Otherwise, the keyword would be overwritten, even if it contains an h5py.Dataset . |
set_attrs
¶
saver.set_attrs(keyword, attrs=None, **kwargs)
Set attrs for an existed data group or dataset.
Requries
Argument | Type | Description |
---|---|---|
keyword | str | The keyword where we set the attributes. |
attrs | dict | The attributes those would be set. |
**kwargs | More attributes those would be combined with attrs by dict.update() . |
set_virtual_set
¶
saver.set_virtual_set(keyword, sub_set_keys, fill_value=0.0)
Create a virtual dataset based on a list of subsets. All subsets require to be h5py.Dataset and need to share the same shape (excepting the first dimension, i.e. the sample number). The subsets would be concatenated at the axis=1
. For example, when d1.shape=[100, 20]
, d2.shape=[80, 20]
, the output virtual set would be d.shape=[100, 2, 20]
. In this case, d[80:, 1, :]
are filled by fill_value
.
Requries
Argument | Type | Description |
---|---|---|
keyword | str | The keyword of the dumped dataset. |
sub_set_keys | (str, ) | A sequence of sub-set keywords. Each sub-set should share the same shape (except for the first dimension). |
fill_value | float | The value used for filling the blank area in the virtual dataset. |
Properties¶
attrs
¶
attrs = saver.attrs # Return the h5py.AttributeManager
saver.attrs = dict(...) # Use a dictionary to update attrs.
Supports using a dictionary to update the attributes of the current h5py
object. The returned attrs
is used as h5py.AttributeManager
.
Examples¶
Example 1
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Dump one into the file. The data shape is (25, 20).
data.h5py: Dump zero into the file. The data shape is (25, 10).
Example 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Open a new file: alpha-test\test_h5supsaver.h5
data.h5py: Dump test1 into the file. The data shape is (100, 20).
data.h5py: Dump test2 into the file. The data shape is (100, 20).
data.h5py: Dump 100 data samples into the existed dataset /group1/group2/test2. The data shape is (200, 20) now.
data.h5py: Create a soft link "test3", pointting to "/test1".
data.h5py: Check open: s["group1"]=True, s["group1/group2"]=False