data.preprocs.ProcScaler¶

Class · Source

proc = mdnc.data.preprocs.ProcScaler(
    shift=None, scale=None, axis=-1, inds=None, parent=None
)

This is a homogeneous processor. It accepts two variables shift (\(\mu\)), scale (\(\sigma\)) to perform the following normalization:

\begin{align} \mathbf{y}_n = \frac{1}{\sigma} ( \mathbf{x}_n - \mu ), \end{align}

where \(\mathbf{x}_n\) and \(\mathbf{y}_n\) are the i^th input argument and the corresponding output argument respectively.

If not setting \(\mu\), would use the mean value of the input mini-batch to shift the argument, i.e. \(\mu_n = \overline{\mathbf{x}}_n\);
If not setting \(\sigma\), would use the max-abs value of the input mini-batch to scale the argument, i.e. \(\sigma_n = \max |\mathbf{x}_n - \mu_n|\).

The above two caulation is estimated on mini-batches. This configuration may cause unstable issues when the input mini-batches are not i.i.d.. Therefore, we recommend users to always set shift and scale manually.

Arguments¶

Requries

Argument	Type	Description
`shift`	`int` or `np.ndarray`	The \(\mu\) variable used for shifting the mean value of mini-batches. This value could be an `np.ndarray` supporting broadcasting. If set `None`, would shift the mean value of each mini-batch to 0.
`scale`	`int` or `np.ndarray`	The \(\sigma\) variable used for shifting the mean value of mini-batches. This value could be an `np.ndarray` supporting broadcasting. If set `None`, would scale the max-abs value of each mini-batch to 1.
`axis`	`int` or `(int, )`	The axis used for calculating the normalization parameters. If given a sequence, would calculate the paramters among higher-dimensional data. Only used when `shift` or `scale` is not `None`.
`inds`	`int` or `(int, )`	Index or indicies of variables where the user implemented methods would be broadcasted. The variables not listed in this argument would be passed to the output without any processing. If set `None`, methods would be broadcasted to all variables.
`parent`	`ProcAbstract`	Another instance derived from `mdnc.data.preprocs.ProcAbstract`. The output of `parent.preprocess()` would be used as the input of `self.preprocess()`. The input of `self.postprocess()` would be used as the input of `parent.preprocess()`.

Methods¶

`preprocess`¶

y_1, y_2, ... = proc.preprocess(x_1, x_2, ...)

The preprocess function. Calculate the re-scaled values from the input variables.

If parent exists, the input of this function comes from the output of parent.preprocess(). Otherwise, the input would comes from the input varibable directly.

Requries

Argument	Type	Description
`(x, )`	`np.ndarray`	A sequence of variables. Each variable comes from the parent's outputs (if parent exists). The output of this method would be passed as the input of the next processor (if this processor is used as parent).

Returns

Argument	Description
`(y, )`	A sequence of `np.ndarray`, the final preprocessed data.

`postprocess`¶

x_1, x_2, ... = proc.postprocess(y_1, y_2, ...)

The postprocess function. The inverse operator of the scaling.

If parent exists, the output of this function would be passed as the input of parent.postprocess(). Otherwise, the output would be returned to users directly.

Requries

Argument	Type	Description
`(y, )`	`np.ndarray`	A sequence of variables. Each variable comes from the next processors's outputs (if parent exists). The output of this method would be passed as the input of the parent's method.

Returns

Argument	Description
`(x, )`	A sequence of `np.ndarray`, the final postprocessed data.

Properties¶

`shift`¶

proc.shift

The shifting value \(\mu\).

`scale`¶

proc.scale

The scaling value \(\sigma\).

`parent`¶

proc.parent

The parent processor of this instance. The processor is also a derived class of ProcAbstract. If the parent does not exist, would return None.

`has_ind`¶

proc.has_ind

A bool flag, showing whether this processor and its all parent processors have inds configured or initialized with _disable_inds. In this case, the arguments of preprocess() and postprocess() would not share the same operation. We call such kind of processors "Inhomogeneous processors".

Examples¶

The processor need to be derived. We have two ways to implement the derivation, see the following examples.

Example 1

Codes

import numpy as np
import matplotlib.pyplot as plt
import mdnc

proc = mdnc.data.preprocs.ProcScaler(shift=1.0, scale=3.0)
random_rng = np.random.default_rng()
x, y = random_rng.normal(loc=1.0, scale=3.0, size=[5, 4]), random_rng.normal(loc=1.0, scale=6.0, size=[7, 5])
x_, y_ = proc.preprocess(x, y)
xr, yr = proc.postprocess(x_, y_)
print('Processed shape:', x_.shape, y_.shape)
print('Processed mean:', np.mean(x_), np.mean(y_))
print('Processed std:', np.std(x_), np.std(y_))
print('Inverse error:', np.amax(np.abs(x - xr)), np.amax(np.abs(y - yr)))

with mdnc.utils.draw.setFigure(font_size=12):
    fig, axs = plt.subplots(nrows=3, ncols=1, sharex=True, figsize=(12, 5))
    axs[0].plot(x_[0])
    axs[1].plot(xr[0])
    axs[2].plot(x[0])
    axs[0].set_ylabel('Preprocessing')
    axs[1].set_ylabel('Inversed\npreprocessing')
    axs[2].set_ylabel('Raw\ndata')
    plt.tight_layout()
    plt.show()

Output

Processed shape: (5, 4) (7, 5)
Processed mean: 0.3419675618982352 -0.6442542139897679
Processed std: 1.1930992341525517 2.3083982027807157
Inverse error: 0.0 0.0

Example 2

Codes

import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
import mdnc

random_rng = np.random.default_rng()
x = random_rng.normal(loc=1.0, scale=3.0, size=[100, 5])
rsc = preprocessing.RobustScaler()
rsc.fit(x)
proc = mdnc.data.preprocs.ProcScaler(shift=np.expand_dims(rsc.center_, axis=0), scale=np.expand_dims(rsc.scale_, axis=0))
x_ = proc.preprocess(x)
x_sl = rsc.transform(x)
x_r = proc.postprocess(x_)
x_r_sl = rsc.inverse_transform(x_sl)
print('Processed error:', np.amax(np.abs(x_ - x_sl)))
print('Inverse error:', np.amax(np.abs(x_r - x_r_sl)))

with mdnc.utils.draw.setFigure(font_size=12):
    fig, axs = plt.subplots(nrows=3, ncols=1, sharex=True, figsize=(12, 5))
    axs[0].plot(x_[0], label='ProcScaler')
    axs[0].plot(x_sl[0], label='RobustScaler')
    axs[0].legend(loc='lower right')
    axs[1].plot(x_r[0], label='ProcScaler')
    axs[1].plot(x_r_sl[0], label='RobustScaler')
    axs[1].legend(loc='lower right')
    axs[2].plot(x[0])
    axs[0].set_ylabel('Preprocessing')
    axs[1].set_ylabel('Inversed\npreprocessing')
    axs[2].set_ylabel('Raw\ndata')
    plt.tight_layout()
    plt.show()

Output

Processed error: 0.0
Inverse error: 0.0

Last update: March 14, 2021

data.preprocs.ProcScaler¶

Arguments¶

Methods¶

preprocess¶

postprocess¶

Properties¶

shift¶

scale¶

parent¶

has_ind¶

Examples¶

Comments

`preprocess`¶

`postprocess`¶

`shift`¶

`scale`¶

`parent`¶

`has_ind`¶