data.h5py.H5SeqConverter¶
Class · Context · Source
converter = mdnc.data.h5py.H5SeqConverter(
file_in_name=None, file_out_name=None
)
Convert any supervised .h5
data file into sequence version. This class allows users to choose some keywords and convert them into sequence version. Those keywords would be saved as in the format of continuous sequence. It could serve as a random splitter for preparing the training of LSTM.
The following figure shows how the data get converted. The converted dataset would be cut into several segments with random lengths.
The converted files should only get loaded by mdnc.data.h5py.H5CParser
.
Warning
- During the conversion, attributes would be lost, and the links and virtual datasets would be treated as
h5py.Datasets
. - Although this class supports context, it does not support dictionary-style APIs like
h5py.Group
.
Arguments¶
Requries
Argument | Type | Description |
---|---|---|
file_in_name | str | A path where we read the non-sequence formatted file. If not set, would not open the dataset. |
file_out_name | str | The path of the output data file. If not set, it would be configured as file_in_name + '_seq' . |
Methods¶
config
¶
converter.config(logver=0, set_shuffle=False, seq_len=10, seq_len_max=20, random_seed=2048, **kwargs)
Make configuration for the converter. Only the explicitly given argument would be used for changing the configuration of this instance.
Requries
Argument | Type | Description |
---|---|---|
logver | int | The verbose level of the outputs. When setting 0, would run silently. |
set_shuffle | bool | Whether to shuffle the order of segments during the conversion. |
seq_len | int | The lower bound of the random segment length. |
seq_len_max | int | The super bound of the random segment length. |
random_seed | int | The random seed used in this instance. |
**kwargs | Any argument that would be used for creating h5py.Dataset . The given argument would override the default value during the dataset creation. |
convert
¶
converter.convert(keyword, **kwargs)
Convert the h5py.Dataset
given by keyword
into the segmented dataset, and save it. The data would be converted into sequence. Note that before the conversion, the data should be arranged continuously of the batch axis.
If you have already converted or copied the keyword, please do not do it again.
Requries
Argument | Type | Description |
---|---|---|
keyword | str | The keyword that would be converted into segmented dataset. |
**kwargs | Any argument that would be used for creating h5py.Dataset . The given argument would override the default value and configs set by config() during the dataset creation. |
copy
¶
converter.copy(keyword, **kwargs)
Copy the h5py.Dataset
given by keyword
into the output file.
If you have already converted or copied the keyword, please do not do it again.
Requries
Argument | Type | Description |
---|---|---|
keyword | str | The keyword that would be copied into the output file. |
**kwargs | Any argument that would be used for creating h5py.Dataset . The given argument would override the default value and configs set by config() during the dataset creation. |
open
¶
converter.open(file_in_name, file_out_name=None)
Open a new file. If a file has been opened before, this file would be closed. This method and the __init__
method (need to specify file_in_name
) support context management.
Requries
Argument | Type | Description |
---|---|---|
file_in_name | str | A path where we read the non-sequence formatted file. |
file_out_name | str | The path of the output data file. If not set, it would be configured as file_in_name + '_seq' . |
close
¶
converter.close()
Close the converter.
Examples¶
Example 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
data.webtools: All required datasets are available.
data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Convert data_to_sequence into the output file. The original data shape is (1000,), splitted into 64 parts.
data.h5py: Copy data_only_copied into the output file. The data shape is (1000,).
Example 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
data.webtools: All required datasets are available.
data.h5py: Current configuration is: {'dtype': <class 'numpy.float32'>, 'shuffle': True, 'fletcher32': True, 'compression': 'gzip'}
data.h5py: Open a new read file: alpha-test\test_data_h5seqconverter1.h5
data.h5py: Open a new output file: alpha-test\test_data_h5seqconverter1_seq.h5
data.h5py: Convert data_to_sequence into the output file. The original data shape is (1000,), splitted into 64 parts.
data.h5py: Copy data_only_copied into the output file. The data shape is (1000,).
data.h5py: Open a new read file: alpha-test\test_data_h5seqconverter2.h5
data.h5py: Open a new output file: alpha-test\test_data_h5seqconverter2_seq.h5
data.h5py: Convert data_to_sequence into the output file. The original data shape is (1000,), splitted into 64 parts.
data.h5py: Copy data_only_copied into the output file. The data shape is (1000,).