\section{Daubechies wavelets} \label{sec:dau} We now have seen three different bases to represent signals: in the sample domain, in the Fourier domain and in the Haar wavelets domain. The all have different properties. We have reasoned that the Haar wavelets have nice properties regarding images; it is able to represent edges well and errors are local. However a little bit of smoothness is sometimes asked for (again in photography, think of a blue sky: it's white/blue on the bottom, darker on the top). This is exactly what the Daubechies wavelets of order four add. Instead of explicitly defining or showing the basis elements, we will directly describe the wavelet transform $W$.\footnote{Note that we didn't describe the transforms described in section~\ref{sec:intro}, as this section was motivational only.} In fact we will describe it as an algorithm, as our intent is to implement it. \subsection{The Daubechies wavelet transform} We will formulate the algorithm in terms of matrix multiplications \cite{numc}. Before we do so, we need the following constants: \begin{align*} c_0 &= \frac{1 + \sqrt{3}}{4 \sqrt{2}}, &\quad c_1 &= \frac{3 + \sqrt{3}}{4 \sqrt{2}}, \\ c_2 &= \frac{3 - \sqrt{3}}{4 \sqrt{2}}, &\quad c_3 &= \frac{1 - \sqrt{3}}{4 \sqrt{2}}. \end{align*} Now let $n$ be even, define the $n \times (n+2)$-matrix $W_n$ as follows (where a blank means $0$). \[ W_n = \begin{pmatrix} c_0 & c_1 & c_2 & c_3 & & & & & & & & & \\ c_3 & -c_2 & c_1 & -c_0 & & & & & & & & & \\ & & c_0 & c_1 & c_2 & c_3 & & & & & & & \\ & & c_3 & -c_2 & c_1 & -c_0 & & & & & & & \\ & & & & & & \ddots & & & & & & \\ & & & & & & & c_0 & c_1 & c_2 & c_3 & & \\ & & & & & & & c_3 & -c_2 & c_1 & -c_0 & & \\ & & & & & & & & & c_0 & c_1 & c_2 & c_3 \\ & & & & & & & & & c_3 & -c_2 & c_1 & -c_0 \end{pmatrix} \] We also need the \emph{even-odd sort matrix} $S_n$, defined by \[ (S_n \vec{x})_i = \begin{cases} x_{2i} &\mbox{ if } i < \frac{n}{2} \\ x_{2i - n + 1} &\mbox{ if } i \geq \frac{n}{2}, \end{cases}\] which permutates the elements of $x$ by putting the elements with an even index in front. In many cases we want to apply the $n \times (n+2)$-matrix $W_n$ to a vector of length $n$, in order to do so we can set $x_n = x_0$ and $x_{n+1} = x_1$, i.e. we consider $\vec{x}$ to be \emph{periodic}. More precisely we can define a linear map $P_n$ as follows. \[ P_n \vec{x} = (x_0, \ldots, x_{n-1}, x_0, x_1) \] Now applying $W_n$ to the periodic $\vec{x}$ is exactly $W_n P_n \vec{x}$. The wavelet transform now consists of multiplying the above matrices in a recursive fashion. Given a vector $\vec{x}$ of length $n$, calculate $\vec{x}^{(1)} = S_n W_n P_n \vec{x}$, and recurse on the first halve of $\vec{x^1}$ using $S_\frac{n}{2}$, $W_\frac{n}{2}$ and $P_\frac{n}{2}$. Repeat this procedure and end with the muliplication of $S_4 W_4 P_4$. More formally the wavelet transform is given by: \[ W \vec{x} := \diag(S_4 W_4 P_4, I_4, \ldots, I_4) \diag(S_8 W_8 P_8, I_8, \ldots, I_8) \cdots \diag(S_\frac{n}{2} W_\frac{n}{2} P_\frac{n}{2}, I_\frac{n}{2}) S_n W_n P_n \vec{x}. \] \subsection{In place} When implementing this transform, we don't have to perform the even-odd sort. Instead, we can simply do all calculations in place and use a stride to do the recursion on the even part. This will permute the original result. \todo{Tell a bit more?} \todo{Add images to show the difference} \subsection{Costs} We will briefly analyze the cost of the transform by counting the number of \emph{flops}, that is muliplications and additions. Computing on element of $W_n \vec{x}$ costs $4$ multiplications and $3$ additions. So $W_n \vec{x}$ costs $7n$ flops. Applying $S_n$ and $P_n$ do not require any flops, as they are mere data manipulations. Consequently computing $W \vec{x}$ costs \[ 7 \times n + 7 \times \frac{n}{2} + \cdots + 7 \times 8 + 7 \times 4 \text{ flops }. \] Using the geometric series $\sum_{i=0}^\infty 2^{-i} = 2$ we can bound the number of flops by $14n$. Compared to the FFT this is a big improvement in terms of scalability, as this wavelet transform has a linear complexity $\BigO{n}$, but the FFT has a complexity of $\BigO{n \log n}$. This is however also a burden, as it leaves not much room for overhead induced by parallelizing the algorithm. We will see an precies analysis of communication costs in section~\ref{sec:par}. \todo{Do a simple upperbound of communication here?} \subsection{The inverse} The wavelet transform is invertible. We will proof this by first showing that $S_n$ and $W_n P_n$ are invertible. In fact, they are orthogonal, which means that the inverse is given by the transpose. \begin{lemma} The matrices $S_n$ and $W_n P_n$ are orthogonal. \end{lemma} \begin{proof} For $S_n$ it is clear, as it is an permutation matrix. For $W_n$ we should calculate the inner products of all pairs of columns. \todo{Calculate} \end{proof} \begin{theorem} The matrix $W$ is invertible with $W^{-1} = W^T$. \end{theorem} \begin{proof} First note that $\diag(S_m W_m P_m, I_m, \ldots, I_m)$ is orthogonal by the above lemma. Now using the fact that the multiplications of two orthogonal matrices is again orthogonal we see that $W$ is orthogonal. \end{proof} \todo{Note that I didnt parallelize the inverse} \subsection{Higher dimensional wavelet transform} Our final goal is to apply the wavelet transform to images. Of course we could simply put all the pixels of an image in a row and apply $W$. But if we do this, we don't use the spatial information of the image at all! In order to use the spatial information we have to apply $W$ in both directions. To be precise: we will apply $W$ to every row and then apply $W$ to all of the resulting columns. We can also do this the other way around, but this does not matter: \begin{lemma} Given a matrix $F$ and \todo{think of nice formulation} \end{lemma} \begin{proof} \todo{Give the simple calculation} \end{proof} This lemma expresses some sort of commutativity and generalises to higher dimensions by apply this commutativity recursively. As we don't need the general statement (i.e. we will only apply $W$ to images) we won't spell out the proof.