Skip to content

data.preprocs.ProcMerge

Class ยท Source

proc = mdnc.data.preprocs.ProcMerge(
    procs=None, num_procs=None, parent=None
)

Merge manager. This processor is inhomogeneous, and designed for merging different processors by a more efficient way. For example,

p = ProcMerge([Proc1(...), Proc2(...)])

Would apply Proc1 to the first argument, and Proc2 to the second argument. It is equivalent to

p = Proc1(..., inds=0, parent=Proc2(..., inds=1))

This class should not be used if any sub-processor does not return the results with the same number of the input variables (out-arg changed). One exception is, the parent of this class could be an out-arg changed processor.

This API is more intuitive for users to concatenate serveral processors together. It will make your codes more readable and reduce the stack level of the processors.

Arguments

Requries

Argument Type Description
procs (ProcAbstract, ) A sequence of processors. Each processor is derived from mdnc.data.preprocs.ProcAbstract. Could be used for initializing this merge processor.
num_procs object The number of input arguments of this processor. If not set, would infer the number from the length of the argument procs. At least one of procs or num_procs needs to be specified. The two arguments could be specified together.
parent ProcAbstract An instance derived from mdnc.data.preprocs.ProcAbstract. This instance would be used as the parent of the current instance.
Warning

The argument num_procs should be greater than procs, if both num_procs and procs are specified.

Methods

preprocess

y_1, y_2, ... = proc.preprocess(x_1, x_2, ...)

The preprocess function. The nth variable would be sent to the nth processor configured for proc.

If parent exists, the input of this function comes from the output of parent.preprocess(). Otherwise, the input would comes from the input varibable directly.

Requries

Argument Type Description
(x, ) np.ndarray A sequence of variables. Each variable comes from the parent's outputs (if parent exists). The output of this method would be passed as the input of the next processor (if this processor is used as parent).

Returns

Argument Description
(y, ) A sequence of np.ndarray, the final preprocessed data.

postprocess

x_1, x_2, ... = proc.postprocess(y_1, y_2, ...)

The postprocess function. The nth variable would be sent to the nth processor configured for proc.

If parent exists, the output of this function would be passed as the input of parent.postprocess(). Otherwise, the output would be returned to users directly.

Requries

Argument Type Description
(y, ) np.ndarray A sequence of variables. Each variable comes from the next processors's outputs (if parent exists). The output of this method would be passed as the input of the parent's method.

Returns

Argument Description
(x, ) A sequence of np.ndarray, the final postprocessed data.

Operators

__getitem__

proc_i = proc[idx]

Get the ith sub-processor.

Warning

If one sub-processor is managing multiple indicies, the returned sub-processor would always be same for those indicies. For example,

proc_m = Proc2(...)
proc = ProcMerge([Proc1(...), proc_m, proc_m])
proc_1 = proc[1]
proc_2 = proc[2]
print(proc_m is proc_1, proc_m is proc_2)

This behavior is important if your proc_m is an inhomogeneous processor. It means although you get proc_2 by proc[2], you still need to place your argument as the 2nd input when using proc_2.

Requries

Argument Type Description
idx int The index of the sub-processor.

Returns

Argument Description
proc_i An instance derived from ProcAbstract, the ith sub-processor.

__setitem__

proc[idx] = proc_i
Info

This method supports multiple assignment, for example:

proc = ProcMerge(num_procs=3)
proc[:] = Proc1(...)
proc[1:2] = Proc2(...)

This would be equivalent to

proc_m = Proc2(...)
proc = ProcMerge([Proc1(...), proc_m, proc_m])

Requries

Argument Type Description
idx int or
slice or
tuple
The indicies that would be overwritten by the argument proc_i.
proc_i ProcAbstract An instance derived from ProcAbstract, this sub-processor would be used for overriding one or more indicies.

Properties

num_procs

proc.num_procs

The number of sub-processors for this class. If one sub-processor is used for managing multiple indicies, it will be count for mutiple times.


parent

proc.parent

The parent processor of this instance. The processor is also a derived class of ProcAbstract. If the parent does not exist, would return None.


has_ind

proc.has_ind

A bool flag, showing whether this processor and its all parent processors have inds configured. In this case, the arguments of preprocess() and postprocess() would not share the same operation. We call such kind of processors "Inhomogeneous processors".

Certianly, it will always be proc.has_ind=True for this class.

Examples

There are many kinds of method for using this class. For example,

Example 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import numpy as np
import mdnc

proc = mdnc.data.preprocs.ProcMerge([mdnc.data.preprocs.ProcScaler(), mdnc.data.preprocs.ProcNSTScaler(dim=1)])
random_rng = np.random.default_rng()
x, y = random_rng.normal(loc=-1.0, scale=0.1, size=[5, 3]), random_rng.normal(loc=1.0, scale=3.0, size=[4, 2])
x_, y_ = proc.preprocess(x, y)
xr, yr = proc.postprocess(x_, y_)
print('Processed shape:', x_.shape, y_.shape)
print('Processed mean:', np.mean(x_), np.mean(y_))
print('Processed range:', np.amax(np.abs(x_)), np.amax(np.abs(y_)))
print('Inverse error:', np.amax(np.abs(x - xr)), np.amax(np.abs(y - yr)))
Processed shape: (5, 3) (4, 2)
Processed mean: 4.440892098500626e-16 2.7755575615628914e-17
Processed range: 1.0 1.0
Inverse error: 0.0 0.0
Example 2
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np
import mdnc

proc1 = mdnc.data.preprocs.ProcScaler()
proc2 = mdnc.data.preprocs.ProcNSTScaler(dim=1, inds=0, parent=mdnc.data.preprocs.ProcScaler(inds=1))
proc = mdnc.data.preprocs.ProcMerge(num_procs=3)
proc[0] = proc1
proc[1:] = proc2
random_rng = np.random.default_rng()
x, y, z = random_rng.normal(loc=-1.0, scale=0.1, size=[5, 3]), random_rng.normal(loc=1.0, scale=3.0, size=[4, 2]), random_rng.normal(loc=1.0, scale=3.0, size=[4, 2])
x_, y_, z_ = proc.preprocess(x, y, z)
xr, yr, zr = proc.postprocess(x_, y_, z_)
print('Processed shape:', x_.shape, y_.shape, z_.shape)
print('Processed mean:', np.mean(x_), np.mean(y_), np.mean(z_))
print('Processed range:', np.amax(np.abs(x_)), np.amax(np.abs(y_)), np.amax(np.abs(z_)))
print('Inverse error:', np.amax(np.abs(x - xr)), np.amax(np.abs(y - yr)), np.amax(np.abs(z - zr)))
Processed shape: (5, 3) (4, 2) (4, 2)
Processed mean: -1.7763568394002506e-16 -1.8041124150158794e-16 -1.314226505400029e-14
Processed range: 1.0 1.0 1.0
Inverse error: 0.0 1.1102230246251565e-16 0.0

This class could be also used for merge customized processor. But the customized processor should ensure the input and output numbers are the same, for example,

Example 3
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
import mdnc

class ProcDerived(mdnc.data.preprocs.ProcAbstract):
    def __init__(self, a, parent=None):
        super().__init__(parent=parent, _disable_inds=True)
        self.a = a

    def preprocess(self, x, y):
        return self.a * x, (2 * self.a) * y

    def postprocess(self, x, y):
        return x / self.a, y / (2 * self.a)

proc1 = mdnc.data.preprocs.ProcScaler()
proc2 = mdnc.data.preprocs.ProcNSTScaler(dim=1, parent=ProcDerived(a=2.0))
proc = mdnc.data.preprocs.ProcMerge(num_procs=3)
proc[0] = proc1
proc[1:] = proc2
random_rng = np.random.default_rng()
x, y, z = random_rng.normal(loc=-1.0, scale=0.1, size=[5, 3]), random_rng.normal(loc=1.0, scale=3.0, size=[4, 2]), random_rng.normal(loc=1.0, scale=3.0, size=[4, 2])
x_, y_, z_ = proc.preprocess(x, y, z)
xr, yr, zr = proc.postprocess(x_, y_, z_)
print('Processed shape:', x_.shape, y_.shape, z_.shape)
print('Processed mean:', np.mean(x_), np.mean(y_), np.mean(z_))
print('Processed range:', np.amax(np.abs(x_)), np.amax(np.abs(y_)), np.amax(np.abs(z_)))
print('Inverse error:', np.amax(np.abs(x - xr)), np.amax(np.abs(y - yr)), np.amax(np.abs(z - zr)))
Processed shape: (5, 3) (4, 2) (4, 2)
Processed mean: -1.7763568394002506e-16 0.0 -5.273559366969494e-16
Processed range: 1.0 1.0 1.0
Inverse error: 0.0 2.220446049250313e-16 0.0

Last update: March 14, 2021

Comments