In this article, we would discuss the trick about training and testing phases for dictionary learning (sparse coding). The original work could be referred here. As extra reading materials, we suggest reading Convex Optimization for understanding how to apply Lagrangian method and Matrix Cookbook to refer some conclusions about how to calculate gradients for matrices.
The pdf version of this article could be found here:
$(2)$ indicates the analytical solution for Lasso problem implicitly. To find the best $\boldsymbol{\alpha}_k$, we still need to apply some tricks. Denote that $\boldsymbol{\theta}_k = \mathrm{sign}(\boldsymbol{\alpha}_k)$, we could rewrite $(2)$ as
$(4)$ indicates a fast solution for $(2)$. It proves that considering that the $i^{\mathrm{th}}$ element of $\mathbf{y}_k$, i.e. $y_{ki}$, we would find that when $y_{ki} > \lambda$, $\alpha_{ki} > 0$, and when $y_{ki} < -\lambda$, $\alpha_{ki} < 0$. Furthermore, when $y_{ki} \in [-\lambda,~\lambda]$, $\alpha_{ki}=0$. After confirming $\boldsymbol{\theta}_k$, it will be easy to get $\boldsymbol{\alpha}_k$ from $(3)$ directly.
Learn the dictionary
If we rewrite the coding as a matrix as below,
\begin{equation}
\mathbf{A} = \left[
\begin{array}{c c c c}
\boldsymbol{\alpha}_1 & \boldsymbol{\alpha}_2 & \cdots & \boldsymbol{\alpha}_N
\end{array}
\right],
\end{equation}
then we could rewrite the dictionary learning problem by Frobenius norm,
Training dictionary requires us to train $\mathbf{D}$ and $\mathbf{A}$ alternatively. The method for training $\mathbf{A}$ has been discussed before, hence we would discuss how to train $\mathbf{D}$ in the following part. Subsequently, we only note that the trainable variable is $\mathbf{D}$ in $(6)$, which means $(1)$ and $(6)$ require to be solved alternatively.
Solving the dictionary training need us to use Lagrangian multiplier method. Denote the multiplier as $\mu_j$, we could incorporate the constraints into the problem,
Note that $\mathbf{D}$ has been represented by $\boldsymbol{\Lambda}$, we would know that minimizing $\mathcal{L}(\cdot)$ could be applied on $\boldsymbol{\Lambda}$ solely. Hence we have
$(15)$ is in the quadratic form, hence it is convex and we could find the analytical solution for $\boldsymbol{\Lambda}$. Substitute $\boldsymbol{\Lambda}$ into $(11)$, we would solve $\mathbf{D}$.
An interesting thing is that if anyone substitute $(11)$ into $(15)$, then it will be