Special Notes on Feb. 28, 2019

Date: Feb 28, 2019
Last Updated: Feb 28, 2019
Categories:
Notes Theory
Tags:
algorithm inverse-problem optimization

Contents


Introduction

In this article, we would discuss the non-negative constrained least length problem. Due to the limitation of time, we only discuss the simplified form of this problem. The solution of this problem is derived from applying non-negative least squares algorithm.

KT conditions

Check here to see Karush–Kuhn–Tucker conditions in Wikipedia.

To solve this problem, first we need to introduce the Kuhn-Tucker (KT) theorem (or KT conditons). We use $\mathbf{a} \succ \mathbf{b}$ to represent $\forall~i,~a_i > b_i$. Consider that we have such a problem

\begin{equation} \begin{aligned} \hat{\mathbf{x}} = &~\arg \min\limits_{\mathbf{x}} E(\mathbf{x}), \\ E(\mathbf{x}) = &~\left\lVert \mathbf{y} - \mathbf{A} \mathbf{x} \right\rVert^2_2, \\[5pt] \mathrm{s.t.}~& \mathbf{Hx} \succeq \mathbf{h}, \end{aligned} \end{equation}

where $\mathbf{Hx} \succeq \mathbf{h}$ defines a group of constraints that

\begin{equation} \begin{aligned} \begin{bmatrix} \mathbf{h}^T_1 \\ \mathbf{h}^T_2 \\ \mathbf{h}^T_3 \\ \cdots \\ \mathbf{h}^T_P \end{bmatrix} \mathbf{x} \succeq \begin{bmatrix} h_1 \\ h_2 \\ h_3 \\ \cdots \\ h_P \end{bmatrix}, \end{aligned} \end{equation}

where each row is a constraint.

Consider that

\begin{equation} \begin{aligned} \frac{1}{2} \nabla E &= \nabla [\mathbf{H}\mathbf{x}] \cdot \mathbf{z}. \\ - \mathbf{A}^T \left( \mathbf{y} - \mathbf{A} \mathbf{x} \right) &= \mathbf{H}^T \mathbf{z}. \end{aligned} \end{equation}

If we could find a $\mathbf{z}$ that is non-negative, then we could divide it into two parts:

\begin{equation} \begin{aligned} \mathbf{z} = \begin{bmatrix} \mathbf{z}_E \succ 0\\ \mathbf{z}_S = 0 \end{bmatrix} && \mathrm{and} && \left\{ \begin{matrix} \mathbf{H_E x} = \mathbf{h}_E, \\ \mathbf{H_S x} \succ \mathbf{h}_S. \end{matrix} \right. \end{aligned} \end{equation}

We call $(4)$KT conditions”, which means for the solution, if a constraint is satisfied on the boundary, the gradient descent direction in this dimension would be opposite to the feasible domain. If a constraint is satisfied loosely (where $\mathbf{H_S x} \succ \mathbf{h}_S$), in this dimension the solution could converge to minimum.

The proof of KT conditions is complicated, although KT conditions could demonstrated by the geometric meaning clearly. Here we would not discuss the theory on this issue, but just use KT conditions as a tool. If we could find a solution which could satisfy the KT conditions, then we would know that this is the exact solution of the non-negative constrained problem.

Non-negative least squares algorithm

Theory of the solution

Consider we have such a simplified form

\begin{equation} \begin{aligned} \hat{\mathbf{m}} = &~\arg \min\limits_{\mathbf{m}} E(\mathbf{m}), \\ E(\mathbf{m}) = &~\left\lVert \mathbf{d} - \mathbf{G} \mathbf{m} \right\rVert^2_2, \\[5pt] \mathrm{s.t.}~& \mathbf{m} \succeq \mathbf{0}, \end{aligned} \end{equation}

where we let $\mathbf{H}=\mathbf{I}$, and $\mathbf{h} = \mathbf{0}$. Then consider that

\begin{equation} \begin{aligned} \mathbf{H}_E \mathbf{m} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & 0 & 0 & \cdots & 0 \\ 0 & 0 & 1 & 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 & \cdots & 0 \end{bmatrix}_{P_E \times N} \mathbf{m} = \begin{bmatrix} m_{E1} \\ m_{E2} \\ m_{E3} \\ \cdots \\ m_{EP_E} \end{bmatrix} = \mathbf{m}_E = \mathbf{0}. \end{aligned} \end{equation}

And we also have $\mathbf{H}_S \mathbf{m} = \mathbf{m}_S \succ \mathbf{0}$. Here $\mathbf{H}$ serves as a selector which extract indexes from $\mathbf{m}$ to get $\mathbf{m}_E,~\mathbf{m}_S$ respectively.

Then we could rewrite $(3)$ as

\begin{equation} \begin{aligned} \frac{1}{2} \nabla E &= \mathbf{z}. \\ - \mathbf{G}^T \left( \mathbf{d} - \mathbf{G} \mathbf{m} \right) &= \mathbf{z}. \end{aligned} \end{equation}

Hence we have

\begin{equation} \begin{aligned} \nabla E = \begin{bmatrix} \nabla E_E \succ 0\\ \nabla E_S = 0 \end{bmatrix} && \mathrm{and} && \left\{ \begin{matrix} \mathbf{m}_E = \mathbf{0}, \\ \mathbf{m}_S \succ \mathbf{0}. \end{matrix} \right. \end{aligned} \end{equation}

From $(8)$ we could learn that for the solution, on the boundary $\mathbf{m}_E$, the gradient ascends with $\mathbf{m}_E$. In the feasible region (not on the boundary), the gradient is $\mathbf{0}$. Which indicates for a solution $\mathbf{m}$, if $m_i=0$, since $\frac{\partial E}{\partial m_i} > 0$, if we want to decrease $E$, the constraint would be violated. If $m_i > 0$, it has been on the minimum and could not be optimized. Hence such a $\mathbf{m}$ should be the exact solution.

Algorithm

In practice, to get the correct solution, we may perform such an algorithm:

  1. Initialization: $\mathbf{m}=\mathbf{0}$. Now we could find that all $m_i \in \mathbf{m}$ is in equality sense, i.e. $m_i \in \mathbf{m}_E$.
  2. If all $m_i \in \mathbf{m}_E$ satisfies that $\frac{\partial E}{\partial m_i} \geqslant 0$, current $\mathbf{m}$ would be the final solution.
  3. If ther is any $m_i \in \mathbf{m}_E$ that makes $\frac{\partial E}{\partial m_i} < 0$, choose an $i$ with the minimal gradient $\frac{\partial E}{\partial m_i}$, and move $m_i$ into $\mathbf{m}_S$.
  4. Extract the index $S$, then solve the least square problem for a part of parameters: $\mathbf{G}_S \mathbf{m}_S = \mathbf{d} - \mathbf{G}_E \mathbf{m}_E = \mathbf{d}$. Mark the solution as $\mathbf{m}^{\dagger}_S$.
  5. If $\mathbf{m}^{\dagger}_S$ is still in the feasible domain (including the boundary), let $\mathbf{m}_S = \mathbf{m}^{\dagger}_S$ update the boundary set $\mathbf{m}_E$ and return to step 2.
  6. If $\mathbf{m}^{\dagger}_S$ is not in feasible domain, find the maximal $\alpha$ that enables $\mathbf{m}_S = \mathbf{m}_S + \alpha (\mathbf{m}^{\dagger}_S - \mathbf{m}_S)$ to be still in the feasible domain. Actually $\alpha = \min\limits_i \left[ \frac{m_{S_i}}{m_{S_i} - m^{\dagger}_{S_i}} \right]$. Since both $\mathbf{m}_S,~\mathbf{m}^{\dagger}_S$ are in feasible domain, the updated solution would still lie in the feasible domain. Since the updated result would be more close to the local minimum, after the update the function $E$ would decrease. After this step, we should get at least one element $m_i$ that falls back to the boundary from the feasible domain. So we need to update $\mathbf{m}_E$. Return to step 2.

Non-negative constrained least length problem

Consider a more complicated problem

\begin{equation} \begin{aligned} \hat{\mathbf{x}} = &~\arg \min\limits_{\mathbf{x}} L(\mathbf{x}), \\ L(\mathbf{x}) = &~\left\lVert \mathbf{x} \right\rVert^2_2, \\[5pt] \mathrm{s.t.}~& \mathbf{Hx} \succeq \mathbf{h}. \end{aligned} \end{equation}

Approach

To solve this problem, we need to substitute that $\mathbf{d} = \begin{bmatrix} \mathbf{0} \\ 1 \end{bmatrix}$, $\mathbf{G} = \begin{bmatrix} \mathbf{H}^T \\ \mathbf{h}^T \end{bmatrix}$ in $(5)$. Then we have

\begin{equation} \begin{aligned} \hat{\mathbf{m}} = &~\arg \min\limits_{\mathbf{m}} E(\mathbf{m}), \\ \mathbf{e} = &~\begin{bmatrix} \mathbf{0} \\ 1 \end{bmatrix} - \begin{bmatrix} \mathbf{H}^T \\ \mathbf{h}^T \end{bmatrix} \mathbf{m} = \begin{bmatrix} - \mathbf{H}^T \mathbf{m} \\ 1 - \mathbf{h}^T \mathbf{m} \end{bmatrix}, \\[5pt] E(\mathbf{m}) = &~\left\lVert \mathbf{e} \right\rVert^2_2, \\[5pt] \mathrm{s.t.}~& \mathbf{m} \succeq \mathbf{0}. \end{aligned} \end{equation}

Note that here $\mathbf{m}$ is different from $\mathbf{x}$. Consider $\mathbf{H}_{P \times N}$, where $N$ is the dimension of $\mathbf{x}$, hence we know that $\mathbf{G} = \begin{bmatrix} \mathbf{H}^T \\ \mathbf{h}^T \end{bmatrix}$ has $N+1$ rows and $P$ columns, and $\mathbf{m}$ is a $P \times 1$ vector. However, $\mathbf{e},~\mathbf{d}$ are $(N+1) \times 1$ vectors. If we solve $(10)$ like what we have done for $(5)$, we could get the solution that $x_i = -\frac{e_i}{e_{N+1}}$.

Proof

To prove that why we could use $(10)$ to solve $(9)$, we need to assume that the final solution is $\mathbf{x}$ which needs to satisfy the KT conditions. But at first we need to consider the KT conditions for the final solution $\mathbf{m}$. In the following part, all parameters we are talking about are the converged ones when we get the final solution.

\begin{equation} \begin{aligned} \nabla E = \begin{bmatrix} \nabla E_E \succ 0\\ \nabla E_S = 0 \end{bmatrix} && \mathrm{and} && \left\{ \begin{matrix} \mathbf{m}_E = \mathbf{0}, \\ \mathbf{m}_S \succ \mathbf{0}. \end{matrix} \right. \end{aligned} \end{equation}

Then we know that

\begin{align} \mathbf{m}^T \nabla E = \begin{bmatrix} \mathbf{m}_E & \mathbf{m}_S \end{bmatrix} \begin{bmatrix} \nabla E_E \\ \nabla E_S \end{bmatrix} = \mathbf{m}_E \nabla E_E + \mathbf{m}_S \nabla E_S = \mathbf{0}. \end{align}

We have already known that $\frac{1}{2} \nabla E = -\mathbf{G}^T (\mathbf{d} - \mathbf{G}\mathbf{m}) = - \mathbf{G}^T \mathbf{e}$, hence we know

\begin{equation} \begin{aligned} E &= \mathbf{e}^T \mathbf{e} = \begin{bmatrix} - \mathbf{m}^T \mathbf{H} & 1 - \mathbf{m}^T \mathbf{h} \end{bmatrix} \mathbf{e} = \begin{bmatrix} \mathbf{0} & 1 \end{bmatrix} \mathbf{e} - \mathbf{m}^T \begin{bmatrix} \mathbf{H} & \mathbf{h} \end{bmatrix} \mathbf{e} \\ &= e_{N+1} - \mathbf{m} \mathbf{G}^T \mathbf{e} = e_{N+1} + \frac{1}{2} \mathbf{m}^T \nabla E = e_{N+1} > 0. \end{aligned} \end{equation}

Note that if $E=0$, we would have $\mathbf{e}=\mathbf{0}$, which means $\mathbf{Hm} = \mathbf{0}$ while $\mathbf{h} \npreceq \mathbf{0}$. Then this situation is contradict to $\mathbf{Hm} \succ \mathbf{h}$. We would know that there should be $E = e_{N+1} > 0$.

Recall that $x_i = -\frac{e_i}{e_{N+1}}$, i.e. $e_i = -x_i e_{N+1}$, hence we have

\begin{align} \mathbf{e} = \begin{bmatrix} -\mathbf{x} \\ 1 \end{bmatrix} e_{N+1}. \end{align}

Because we have known that $\nabla E_E \succ 0$ and $\nabla E_S = 0$. It is obvious that the whole gradient $\nabla E \succeq \mathbf{0}$. Hence

\begin{equation} \begin{aligned} \frac{1}{2} \nabla E &= -\mathbf{G}^T \mathbf{e} = - \begin{bmatrix} \mathbf{H} & \mathbf{h} \end{bmatrix} \mathbf{e} = - \begin{bmatrix} \mathbf{H} & \mathbf{h} \end{bmatrix} \begin{bmatrix} -\mathbf{x} \\ 1 \end{bmatrix} e_{N+1} = e_{N+1} \left( \mathbf{H}\mathbf{x} - \mathbf{h} \right) \succeq \mathbf{0}. \end{aligned} \end{equation}

Because $e_{N+1} > 0$, $\mathbf{H}\mathbf{x} - \mathbf{h} \succeq \mathbf{0}$, which indicates that our solution is exactly in the feasible domain.

Remove the last element $e_{N+1}$ from $\mathbf{e}$, we have $\mathbf{e}_{N} = \begin{bmatrix} e_1 \\ e_2 \\ \cdots \\ e_N \end{bmatrix} = -\mathbf{H}^T \mathbf{m}$. Then let us consider the gradient of the length function $L(\mathbf{x})$,

\begin{equation} \begin{aligned} \frac{1}{2} \nabla L = \frac{1}{2} \nabla \lVert \mathbf{x} \rVert^2_2 = \mathbf{x} = - \frac{1}{e_{N+1}} \begin{bmatrix} e_1 \\ e_2 \\ \cdots \\ e_N \end{bmatrix} = \mathbf{H}^T \frac{\mathbf{m}}{e_{N+1}}. \end{aligned} \end{equation}

Compare to $(3)$, we know the gradient could be described by non-negative combination of rows from $\mathbf{H}$, which means we could define that $\mathbf{z} = \frac{\mathbf{m}}{e_{N+1}}$. To be emphasized, $e_{N+1} > 0$, hence we know $\mathbf{z}$ has the same sign compared to $\mathbf{m}$,

\begin{equation} \begin{aligned} \left\{ \begin{matrix} \mathbf{m}_E = \mathbf{0}, \\ \mathbf{m}_S \succ \mathbf{0}. \end{matrix} \right. && \mathrm{and} && \left\{ \begin{matrix} \mathbf{z}_E = \mathbf{0}, \\ \mathbf{z}_S \succ \mathbf{0}. \end{matrix} \right. \end{aligned} \end{equation}

Then, recall that

\begin{equation} \begin{aligned} \nabla E = - 2 \mathbf{G}^T \mathbf{e} = 2 e_{N+1} \left( \mathbf{H}\mathbf{x} - \mathbf{h} \right). \end{aligned} \end{equation}

Since $e_{N+1} > 0$, $\mathbf{H}\mathbf{x} - \mathbf{h}$ has the same sign compared to $\nabla E$. Recall $(11)$, we could rewrite the equations as

\begin{equation} \begin{aligned} \left\{ \begin{matrix} \mathbf{m}_E = \mathbf{0}, \\ \mathbf{m}_S \succ \mathbf{0}. \end{matrix} \right. && \mathrm{and} && \left\{ \begin{matrix} \mathbf{H}_E\mathbf{x} - \mathbf{h}_E \succ \mathbf{0}, \\ \mathbf{H}_S\mathbf{x} - \mathbf{h}_S = \mathbf{0}. \end{matrix} \right. \end{aligned} \end{equation}

Therefore, we know that

\begin{equation} \begin{aligned} \left\{ \begin{matrix} \mathbf{z}_E = \mathbf{0}, \\ \mathbf{z}_S \succ \mathbf{0}. \end{matrix} \right. && \mathrm{and} && \left\{ \begin{matrix} \mathbf{H}_E\mathbf{x} - \mathbf{h}_E \succ \mathbf{0}, \\ \mathbf{H}_S\mathbf{x} - \mathbf{h}_S = \mathbf{0}. \end{matrix} \right. \end{aligned} \end{equation}

Compared to $(4)$, we may find that this solution has a pair of reversed subscripts, because the $E$ and $S$ here are used to describe $\mathbf{m}$. In fact, when $\mathbf{m}$ gets the boundary, $\mathbf{x}$ is inside the domain, vice versa. Remember to exchange the subscripts if we want a coherent form of the solution.

These are the KT conditions. which means the solution defined in $(16)$ exactly satisfies the KT conditions. In conclusion, $\mathbf{x} = \mathbf{H}^T \frac{\mathbf{m}}{1 - \mathbf{h}^T \mathbf{m}}$ is exactly the solution of the non-negative constrained least length problem.