Special Notes on Feb. 28, 2019
Date: | Feb 28, 2019 |
---|---|
Last Updated: | Feb 28, 2019 |
Categories: | |
Tags: |
Contents
Introduction
In this article, we would discuss the non-negative constrained least length problem. Due to the limitation of time, we only discuss the simplified form of this problem. The solution of this problem is derived from applying non-negative least squares algorithm.
KT conditions
Check here to see Karush–Kuhn–Tucker conditions in Wikipedia.
To solve this problem, first we need to introduce the Kuhn-Tucker (KT) theorem (or KT conditons). We use $\mathbf{a} \succ \mathbf{b}$ to represent $\forall~i,~a_i > b_i$. Consider that we have such a problem
where $\mathbf{Hx} \succeq \mathbf{h}$ defines a group of constraints that
where each row is a constraint.
Consider that
If we could find a $\mathbf{z}$ that is non-negative, then we could divide it into two parts:
We call $(4)$ “KT conditions”, which means for the solution, if a constraint is satisfied on the boundary, the gradient descent direction in this dimension would be opposite to the feasible domain. If a constraint is satisfied loosely (where $\mathbf{H_S x} \succ \mathbf{h}_S$), in this dimension the solution could converge to minimum.
The proof of KT conditions is complicated, although KT conditions could demonstrated by the geometric meaning clearly. Here we would not discuss the theory on this issue, but just use KT conditions as a tool. If we could find a solution which could satisfy the KT conditions, then we would know that this is the exact solution of the non-negative constrained problem.
Non-negative least squares algorithm
Theory of the solution
Consider we have such a simplified form
where we let $\mathbf{H}=\mathbf{I}$, and $\mathbf{h} = \mathbf{0}$. Then consider that
And we also have $\mathbf{H}_S \mathbf{m} = \mathbf{m}_S \succ \mathbf{0}$. Here $\mathbf{H}$ serves as a selector which extract indexes from $\mathbf{m}$ to get $\mathbf{m}_E,~\mathbf{m}_S$ respectively.
Then we could rewrite $(3)$ as
Hence we have
From $(8)$ we could learn that for the solution, on the boundary $\mathbf{m}_E$, the gradient ascends with $\mathbf{m}_E$. In the feasible region (not on the boundary), the gradient is $\mathbf{0}$. Which indicates for a solution $\mathbf{m}$, if $m_i=0$, since $\frac{\partial E}{\partial m_i} > 0$, if we want to decrease $E$, the constraint would be violated. If $m_i > 0$, it has been on the minimum and could not be optimized. Hence such a $\mathbf{m}$ should be the exact solution.
Algorithm
In practice, to get the correct solution, we may perform such an algorithm:
- Initialization: $\mathbf{m}=\mathbf{0}$. Now we could find that all $m_i \in \mathbf{m}$ is in equality sense, i.e. $m_i \in \mathbf{m}_E$.
- If all $m_i \in \mathbf{m}_E$ satisfies that $\frac{\partial E}{\partial m_i} \geqslant 0$, current $\mathbf{m}$ would be the final solution.
- If ther is any $m_i \in \mathbf{m}_E$ that makes $\frac{\partial E}{\partial m_i} < 0$, choose an $i$ with the minimal gradient $\frac{\partial E}{\partial m_i}$, and move $m_i$ into $\mathbf{m}_S$.
- Extract the index $S$, then solve the least square problem for a part of parameters: $\mathbf{G}_S \mathbf{m}_S = \mathbf{d} - \mathbf{G}_E \mathbf{m}_E = \mathbf{d}$. Mark the solution as $\mathbf{m}^{\dagger}_S$.
- If $\mathbf{m}^{\dagger}_S$ is still in the feasible domain (including the boundary), let $\mathbf{m}_S = \mathbf{m}^{\dagger}_S$ update the boundary set $\mathbf{m}_E$ and return to step 2.
- If $\mathbf{m}^{\dagger}_S$ is not in feasible domain, find the maximal $\alpha$ that enables $\mathbf{m}_S = \mathbf{m}_S + \alpha (\mathbf{m}^{\dagger}_S - \mathbf{m}_S)$ to be still in the feasible domain. Actually $\alpha = \min\limits_i \left[ \frac{m_{S_i}}{m_{S_i} - m^{\dagger}_{S_i}} \right]$. Since both $\mathbf{m}_S,~\mathbf{m}^{\dagger}_S$ are in feasible domain, the updated solution would still lie in the feasible domain. Since the updated result would be more close to the local minimum, after the update the function $E$ would decrease. After this step, we should get at least one element $m_i$ that falls back to the boundary from the feasible domain. So we need to update $\mathbf{m}_E$. Return to step 2.
Non-negative constrained least length problem
Consider a more complicated problem
Approach
To solve this problem, we need to substitute that $\mathbf{d} = \begin{bmatrix} \mathbf{0} \\ 1 \end{bmatrix}$, $\mathbf{G} = \begin{bmatrix} \mathbf{H}^T \\ \mathbf{h}^T \end{bmatrix}$ in $(5)$. Then we have
Note that here $\mathbf{m}$ is different from $\mathbf{x}$. Consider $\mathbf{H}_{P \times N}$, where $N$ is the dimension of $\mathbf{x}$, hence we know that $\mathbf{G} = \begin{bmatrix} \mathbf{H}^T \\ \mathbf{h}^T \end{bmatrix}$ has $N+1$ rows and $P$ columns, and $\mathbf{m}$ is a $P \times 1$ vector. However, $\mathbf{e},~\mathbf{d}$ are $(N+1) \times 1$ vectors. If we solve $(10)$ like what we have done for $(5)$, we could get the solution that $x_i = -\frac{e_i}{e_{N+1}}$.
Proof
To prove that why we could use $(10)$ to solve $(9)$, we need to assume that the final solution is $\mathbf{x}$ which needs to satisfy the KT conditions. But at first we need to consider the KT conditions for the final solution $\mathbf{m}$. In the following part, all parameters we are talking about are the converged ones when we get the final solution.
Then we know that
We have already known that $\frac{1}{2} \nabla E = -\mathbf{G}^T (\mathbf{d} - \mathbf{G}\mathbf{m}) = - \mathbf{G}^T \mathbf{e}$, hence we know
Note that if $E=0$, we would have $\mathbf{e}=\mathbf{0}$, which means $\mathbf{Hm} = \mathbf{0}$ while $\mathbf{h} \npreceq \mathbf{0}$. Then this situation is contradict to $\mathbf{Hm} \succ \mathbf{h}$. We would know that there should be $E = e_{N+1} > 0$.
Recall that $x_i = -\frac{e_i}{e_{N+1}}$, i.e. $e_i = -x_i e_{N+1}$, hence we have
Because we have known that $\nabla E_E \succ 0$ and $\nabla E_S = 0$. It is obvious that the whole gradient $\nabla E \succeq \mathbf{0}$. Hence
Because $e_{N+1} > 0$, $\mathbf{H}\mathbf{x} - \mathbf{h} \succeq \mathbf{0}$, which indicates that our solution is exactly in the feasible domain.
Remove the last element $e_{N+1}$ from $\mathbf{e}$, we have $\mathbf{e}_{N} = \begin{bmatrix} e_1 \\ e_2 \\ \cdots \\ e_N \end{bmatrix} = -\mathbf{H}^T \mathbf{m}$. Then let us consider the gradient of the length function $L(\mathbf{x})$,
Compare to $(3)$, we know the gradient could be described by non-negative combination of rows from $\mathbf{H}$, which means we could define that $\mathbf{z} = \frac{\mathbf{m}}{e_{N+1}}$. To be emphasized, $e_{N+1} > 0$, hence we know $\mathbf{z}$ has the same sign compared to $\mathbf{m}$,
Then, recall that
Since $e_{N+1} > 0$, $\mathbf{H}\mathbf{x} - \mathbf{h}$ has the same sign compared to $\nabla E$. Recall $(11)$, we could rewrite the equations as
Therefore, we know that
Compared to $(4)$, we may find that this solution has a pair of reversed subscripts, because the $E$ and $S$ here are used to describe $\mathbf{m}$. In fact, when $\mathbf{m}$ gets the boundary, $\mathbf{x}$ is inside the domain, vice versa. Remember to exchange the subscripts if we want a coherent form of the solution.
These are the KT conditions. which means the solution defined in $(16)$ exactly satisfies the KT conditions. In conclusion, $\mathbf{x} = \mathbf{H}^T \frac{\mathbf{m}}{1 - \mathbf{h}^T \mathbf{m}}$ is exactly the solution of the non-negative constrained least length problem.