diff --git a/wavelet_report/conc.tex b/wavelet_report/conc.tex index 67b1ffd..f7d0a40 100644 --- a/wavelet_report/conc.tex +++ b/wavelet_report/conc.tex @@ -6,4 +6,9 @@ We have discussed the Daubechies wavelet transform of order four to quiet some e As for the application of image compression, we have seen that the wavelets capture both the global shades of the image as well as the sharp edges. By this we can throw away a lot of subtle information, while retaining recognizable images. -\todo{Add concluding figure and caption} +\vspace{3cm} +\begin{figure} + \centering + \includegraphics[width=0.3\textwidth]{parijs.png} + \caption{My girlfriend an I in Paris. Still romantic with only 200 coefficients.} +\end{figure} diff --git a/wavelet_report/dau.tex b/wavelet_report/dau.tex index 5e9a916..acc89a7 100644 --- a/wavelet_report/dau.tex +++ b/wavelet_report/dau.tex @@ -1,7 +1,7 @@ \section{Daubechies wavelets} \label{sec:dau} -We now have seen three different bases to represent signals: in the sample domain, in the Fourier domain and in the Haar wavelets domain. They all have different properties. We have reasoned that the Haar wavelets have nice properties regarding images; it is able to represent edges well and errors are local. However a little bit of smoothness is sometimes asked for (again in photography, think of a blue sky: it's white/blue on the bottom, darker on the top). This is exactly what the Daubechies wavelets of order four add. +We now have seen three different bases to represent signals: in the sample domain, in the Fourier domain and in the Haar wavelets domain. They all have different properties. We have reasoned that the Haar wavelets have nice properties regarding images; it is able to represent edges well and errors are local. However a little bit of smoothness is sometimes asked for, this is exactly what the Daubechies wavelets of order four add. Instead of explicitly defining or showing the basis elements, we will directly describe the wavelet transform $W$.\footnote{Note that we didn't describe the actual transforms described in section~\ref{sec:intro}, as this section was motivational only.} We will describe the transform in terms of matrices, just as in \cite{numc} and \cite{biss}. @@ -43,8 +43,7 @@ In many cases we want to apply the $n \times (n+2)$-matrix $W_n$ to a vector of Now applying $W_n$ to the periodic $\vec{x}$ is exactly $W_n P_n \vec{x}$. The wavelet transform now consists of multiplying the above matrices in a recursive fashion. Given a vector $\vec{x}$ of length $n$, calculate $\vec{x}^{(1)} = S_n W_n P_n \vec{x}$, and recurse on the first halve of $\vec{x}^{(1)}$ using $S_\frac{n}{2}$, $W_\frac{n}{2}$ and $P_\frac{n}{2}$. Repeat this procedure and end with the muliplication of $S_4 W_4 P_4$. More formally the wavelet transform is given by: -\[ W \vec{x} := \diag(S_4 W_4 P_4, I_4, \ldots, I_4) - \diag(S_8 W_8 P_8, I_8, \ldots, I_8) \cdots +\[ W \vec{x} := \diag(S_4 W_4 P_4, I_4, \ldots, I_4) \cdots \diag(S_\frac{n}{2} W_\frac{n}{2} P_\frac{n}{2}, I_\frac{n}{2}) S_n W_n P_n \vec{x}. \] @@ -61,12 +60,12 @@ Just as the Fourier transform, the wavelet transform is invertible. This makes i For $W_n P_n$ we should calculate the inner products of all pairs of rows (or columns). If we take the same row, their inner product is: \[ c_0^2 + c_1^2 + c_2^2 + c_3^2 = \frac{1}{32}(1 + 3 + 2\sqrt(3)) + 9 + 3 + 6\sqrt(3) + 9 + 3 - 6\sqrt(3) + 1 + 3 -2\sqrt(3)) = 1 \] For two different rows we get four non-trivial combinations: - \begin{align} + \begin{align*} c_0c_3 - c_1c_2 + c_1c_2 - c_0c_3 &= 0, \\ c_0c_2 + c_1c_3 &= \frac{1}{32}(3+2\sqrt(3)-3+3-2\sqrt(3)-3) = 0, \\ c_2c_3 - c_3c_2 &= 0, \\ c_1c_0 - c_0c_1 &= 0. - \end{align} + \end{align*} For $W_4 P_4$ there is another combination, which can be shown to be zero with a similar calculation. So indeed $W_n P_n (W_n P_n)^T = I_n$. \end{proof} @@ -79,16 +78,32 @@ Just as the Fourier transform, the wavelet transform is invertible. This makes i \subsection{In place} -\todo{Add images to show the difference} -When implementing this transform, we don't have to perform the even-odd sort. Instead, we can simply do all calculations in place and use a stride to skip the odd elements in further steps. The only difference with the actual transform $W \vec{x}$ is that the output is permuted. However, in our application of image compression, we are not interested in the index of a coefficient, so we don't need to rectify this. In the rest of this paper the Daubechies wavelet transform will refer to this (in-place) variant. - -Assume we have a function \texttt{apply\_wn\_pn(x, n, s)} which computes $W_n P_n (x_0, x_s, \ldots, x_{s(n-1)})$ in place\footnote{Implementing this is not so hard, but it wouldn't make this section nicer.}. The whole algorithm then can nicely be expressed as +When implementing this transform, we don't have to perform the even-odd sort. Instead, we can simply do all calculations in place and use a stride to skip the odd elements in further steps. This is depicted in figure~\ref{fig:wavelet_stride}. The only difference with the actual transform $W \vec{x}$ is that the output is permuted. However, in our application of image compression, we are not interested in the index of a coefficient, so we don't need to rectify this. In the rest of this paper the Daubechies wavelet transform will refer to this (in-place) variant. + +\tikzstyle{plain_line}=[] +\begin{figure} + \begin{tabular}{c|c} + \begin{subfigure}[b]{0.5\textwidth} + \centering + \includegraphics[width=\textwidth]{wavelet_perm} + \caption{The wavelet transform with the even-odd sort} + \end{subfigure}& + \begin{subfigure}[b]{0.5\textwidth} + \centering + \includegraphics[width=\textwidth]{wavelet_stride} + \caption{The wavelet transform using a stride to skip odd} + \end{subfigure} + \end{tabular} + \caption{There are two ways to implement the wavelet transform.} + \label{fig:wavelet_stride} +\end{figure} + +Assume we already have a function \texttt{apply\_wn\_pn(x, n, s)} which computes the expression $W_n P_n (x_0, x_s, \ldots, x_{s(n-1)})$ (note the striding) in place\footnote{Implementing this is not so hard, but it would not explain the algorithm better.}. The whole algorithm then can nicely be expressed as \begin{lstlisting} -wavelet(x, n) = - for i = 1 to n/4 - apply\_wn\_pn(x, n/i, i) - i = i*2 +for i = 1 to n/4 + apply_wn_pn(x, n/i, i) + i = i*2 \end{lstlisting} For future reference we also define the following computation: \texttt{apply\_wn(x, y0, y1, n, s)} which computes $W_n (x_0, \ldots, y_0, y_1)$. Note that \texttt{apply\_wn\_pn(x, n, s)} can now be expressed as \texttt{apply\_wn(x, x0, xs, n, s)}. @@ -99,8 +114,7 @@ We will briefly analyze the cost of the transform by counting the number of \emp \[ 7 \times n + 7 \times \frac{n}{2} + \cdots + 7 \times 8 + 7 \times 4 \text{ flops }. \] Using the geometric series $\sum_{i=0}^\infty 2^{-i} = 2$ we can bound the number of flops by $14n$. -Compared to the FFT this is a big improvement in terms of scalability, as this wavelet transform has a linear complexity $\BigO{n}$, but the FFT has a complexity of $\BigO{n \log n}$. This is however also a burden, as it leaves not much room for overhead induced by parallelizing the algorithm. We will see an precies analysis of communication costs in section~\ref{sec:par}. -\todo{Do a simple upperbound of communication here?} +Compared to the FFT this is a big improvement in terms of scalability, as this wavelet transform has a linear complexity $\BigO{n}$, but the FFT has a complexity of $\BigO{n \log n}$. This is however also a burden, as it leaves not much room for overhead induced by parallelizing the algorithm: the sequential version takes $14n$ flops and the parallel version at best $14\frac{n}{p}$ flops, where $p$ is the number of processors. So we see that the communication overhead may not exceed $14(p-1)\frac{n}{p}$ flops. \subsection{Higher dimensional wavelet transform} @@ -114,18 +128,7 @@ Our final goal is to apply the wavelet transform to images. Of course we could s Given a $n \times n$-matrix $F$ and a $m \times m$-matrix $G$ and an $m \times n$-matrix $X$, then $G^{V}(F^{H} X) = F^{H}(G^{V} X)$. \end{lemma} \begin{proof} - Let $Z = F^{H} X$ and $Y = G^{V} (F^{H} X)$ then their coefficients are given by - \begin{align} - z_{k,j} &= ? \\ - y_{i,j} &= ?. - \end{align} - On the other hand, let $Z' = G^{V} X$ and $Y' = F^{H} (G^{V} X)$, then their coefficients are: - \begin{align} - z'_{i,l} &= ? \\ - y'_{i,j} &= ?. - \end{align} - By interchanging the sums and using commutativity of multiplication of reals, we see: - \[ y'_{i,j} = \ldots = y_{i,j} \] + This can be done by writing out all matrix multiplication and then interchanging the sums and using commutativity of multiplication of reals. We left out the calculation for brevity. \end{proof} This lemma expresses some sort of commutativity and generalises to higher dimensions by apply this commutativity inductively. As we don't need the general statement (i.e. we will only apply $W$ to images) we won't spell out the proof. If we say that we apply $W$ to an image, it is meant that we actually apply $W^{H} W^{V}$. On non-square images we also use this notation, despite the fact that the first $W$ has a different size than the second. diff --git a/wavelet_report/images.tex b/wavelet_report/images.tex index 18b14df..0978c68 100644 --- a/wavelet_report/images.tex +++ b/wavelet_report/images.tex @@ -8,17 +8,49 @@ \begin{document} +%%%%% INTRO IMAGE +% \tikzstyle{plain_line}=[] +% \begin{figure} +% \centering +% \begin{subfigure}[b]{0.5\textwidth} +% \begin{tikzpicture} +% \begin{groupplot}[group style={group size=1 by 4}, clip=false, yticklabels={,,}, height=3cm, width=\textwidth, xmin=0, xmax=128, ymin=-1, ymax=1, domain=0:128] +% \nextgroupplot +% \addplot[plain_line] coordinates {(0,0) (1,0) (1,1) (2,1) (2,0) (128,0)}; \legend{$e_1$} +% \nextgroupplot \addplot[plain_line] coordinates {(0,0) (2,0) (2,1) (3,1) (3,0) (128,0)}; \legend{$e_2$} +% \nextgroupplot \addplot[plain_line] coordinates {(0,0) (3,0) (3,1) (4,1) (4,0) (128,0)}; \legend{$e_3$} +% \nextgroupplot \addplot[plain_line] {0.8*sin(1*360*x/128) + 0.2*sin(3*360*x/128) + 0.08*sin(5*360*x/128)}; +% \end{groupplot} +% \end{tikzpicture} +% \caption{Representing a signal on the standard basis.} +% \end{subfigure}~ +% \begin{subfigure}[b]{0.5\textwidth} +% \begin{tikzpicture} +% \begin{groupplot}[group style={group size=1 by 4}, yticklabels={,,}, height=3cm, width=\textwidth, xmin=0, xmax=128, ymin=-1, ymax=1, domain=0:128] +% \nextgroupplot \addplot[plain_line] {sin(1*360*x/128)}; \legend{$f_1$} +% \nextgroupplot \addplot[plain_line] {sin(3*360*x/128)}; \legend{$f_3$} +% \nextgroupplot \addplot[plain_line] {sin(5*360*x/128)}; \legend{$f_5$} +% \nextgroupplot \addplot[plain_line] {0.8*sin(1*360*x/128) + 0.2*sin(3*360*x/128) + 0.08*sin(5*360*x/128)}; +% \end{groupplot} +% \end{tikzpicture} +% \caption{Representing a signal on the Fourier basis.} +% \end{subfigure} +% \caption{We can represent the same signal on different basis. Note that the Fourier representation is smaller in this case.} +% \label{fig:basicplot} +% \end{figure} +% $$ 0.088 + 0.174 \times 0.257 $$ +% $$ 0.798 \times 0.201 + 0.081 $$ +% $$ = \ldots + $$ + + +%%%%% ERROR IMAGE \tikzstyle{plain_line}=[] \begin{figure} \centering \begin{subfigure}[b]{0.5\textwidth} \begin{tikzpicture} - \begin{groupplot}[group style={group size=1 by 4}, clip=false, yticklabels={,,}, height=3cm, width=\textwidth, xmin=0, xmax=128, ymin=-1, ymax=1, domain=0:128] - \nextgroupplot - \addplot[plain_line] coordinates {(0,0) (1,0) (1,1) (2,1) (2,0) (128,0)}; \legend{$e_1$} - \nextgroupplot \addplot[plain_line] coordinates {(0,0) (2,0) (2,1) (3,1) (3,0) (128,0)}; \legend{$e_2$} - \nextgroupplot \addplot[plain_line] coordinates {(0,0) (3,0) (3,1) (4,1) (4,0) (128,0)}; \legend{$e_3$} + \begin{groupplot}[group style={group size=1 by 1}, clip=false, yticklabels={,,}, height=3cm, width=\textwidth, xmin=0, xmax=128, ymin=-1, ymax=1, domain=0:128] \nextgroupplot \addplot[plain_line] {0.8*sin(1*360*x/128) + 0.2*sin(3*360*x/128) + 0.08*sin(5*360*x/128)}; \end{groupplot} \end{tikzpicture} @@ -26,11 +58,8 @@ \end{subfigure}~ \begin{subfigure}[b]{0.5\textwidth} \begin{tikzpicture} - \begin{groupplot}[group style={group size=1 by 4}, yticklabels={,,}, height=3cm, width=\textwidth, xmin=0, xmax=128, ymin=-1, ymax=1, domain=0:128] - \nextgroupplot \addplot[plain_line] {sin(1*360*x/128)}; \legend{$f_1$} - \nextgroupplot \addplot[plain_line] {sin(3*360*x/128)}; \legend{$f_3$} - \nextgroupplot \addplot[plain_line] {sin(5*360*x/128)}; \legend{$f_5$} - \nextgroupplot \addplot[plain_line] {0.8*sin(1*360*x/128) + 0.2*sin(3*360*x/128) + 0.08*sin(5*360*x/128)}; + \begin{groupplot}[group style={group size=1 by 1}, yticklabels={,,}, height=3cm, width=\textwidth, xmin=0, xmax=128, ymin=-1, ymax=1, domain=0:128] + \nextgroupplot \addplot[plain_line] {0.8*sin(1*360*x/128) + 0.2*sin(3*360*x/128) + 0.08*sin(5*360*x/128) + 0.5*sin(10*360*x/128)}; \end{groupplot} \end{tikzpicture} \caption{Representing a signal on the Fourier basis.} @@ -39,8 +68,4 @@ \label{fig:basicplot} \end{figure} -$$ 0.088 + 0.174 \times 0.257 $$ -$$ 0.798 \times 0.201 + 0.081 $$ -$$ = \ldots + $$ - \end{document} diff --git a/wavelet_report/images/1000oostenrijk.png b/wavelet_report/images/1000oostenrijk.png new file mode 100644 index 0000000..2f08945 Binary files /dev/null and b/wavelet_report/images/1000oostenrijk.png differ diff --git a/wavelet_report/images/100oostenrijk.png b/wavelet_report/images/100oostenrijk.png new file mode 100644 index 0000000..ed03642 Binary files /dev/null and b/wavelet_report/images/100oostenrijk.png differ diff --git a/wavelet_report/images/10oostenrijk.png b/wavelet_report/images/10oostenrijk.png new file mode 100644 index 0000000..3c78ee3 Binary files /dev/null and b/wavelet_report/images/10oostenrijk.png differ diff --git a/wavelet_report/images/2dpar.svg b/wavelet_report/images/2dpar.svg new file mode 100644 index 0000000..807dd5b --- /dev/null +++ b/wavelet_report/images/2dpar.svg @@ -0,0 +1,180 @@ + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + 1 + 2 + 3 + 0 + + diff --git a/wavelet_report/images/eindhoven.png b/wavelet_report/images/eindhoven.png new file mode 100644 index 0000000..d01bea5 Binary files /dev/null and b/wavelet_report/images/eindhoven.png differ diff --git a/wavelet_report/images/eindhoven_lin.png b/wavelet_report/images/eindhoven_lin.png new file mode 100644 index 0000000..71133e7 Binary files /dev/null and b/wavelet_report/images/eindhoven_lin.png differ diff --git a/wavelet_report/images/eindhoven_log.png b/wavelet_report/images/eindhoven_log.png new file mode 100644 index 0000000..ddded40 Binary files /dev/null and b/wavelet_report/images/eindhoven_log.png differ diff --git a/wavelet_report/images/eindhoven_sqrt.png b/wavelet_report/images/eindhoven_sqrt.png new file mode 100644 index 0000000..c011654 Binary files /dev/null and b/wavelet_report/images/eindhoven_sqrt.png differ diff --git a/wavelet_report/images/fourier_error.svg b/wavelet_report/images/fourier_error.svg new file mode 100644 index 0000000..cf15728 --- /dev/null +++ b/wavelet_report/images/fourier_error.svg @@ -0,0 +1,1093 @@ + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/wavelet_report/images/oostenrijk.png b/wavelet_report/images/oostenrijk.png new file mode 100644 index 0000000..b99522a Binary files /dev/null and b/wavelet_report/images/oostenrijk.png differ diff --git a/wavelet_report/images/parijs.png b/wavelet_report/images/parijs.png new file mode 100644 index 0000000..2f8f26f Binary files /dev/null and b/wavelet_report/images/parijs.png differ diff --git a/wavelet_report/images/wavelet_perm.svg b/wavelet_report/images/wavelet_perm.svg new file mode 100644 index 0000000..d1818cf --- /dev/null +++ b/wavelet_report/images/wavelet_perm.svg @@ -0,0 +1,1408 @@ + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/wavelet_report/images/wavelet_stride.svg b/wavelet_report/images/wavelet_stride.svg new file mode 100644 index 0000000..70ab875 --- /dev/null +++ b/wavelet_report/images/wavelet_stride.svg @@ -0,0 +1,849 @@ + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/wavelet_report/img.tex b/wavelet_report/img.tex index f2095de..f905d57 100644 --- a/wavelet_report/img.tex +++ b/wavelet_report/img.tex @@ -9,33 +9,84 @@ In this subsection we will consider an image $X$ to be a real valued $n \times m \[ C = \{ (y_{i,j}, i, j) \| 0 \leq i \leq n, 0 \leq j \leq m, |y_{i,j}| \leq \tau \}. \] Note that we now have to store the location of the coefficients bigger than the threshold $\tau$. Assume that we did not throw away $c$ of the $nm$ coefficients. Then the compressed size is $(8+4+4) \times c = 16 \times c$. So this compression becomes useful when $c \leq \frac{1}{2}nm$. This might seem like a big loss in quality. But as we can see in figure~\ref{fig:compression}, the wavelets are capable of capturing the important shapes and edges of the image. -\todo{ + +\begin{figure} + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{oostenrijk.png} + \caption{Original} + \end{subfigure}~ + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{10oostenrijk.png} + \caption{$\rho=\frac{1}{10}$} + \end{subfigure}~ + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{100oostenrijk.png} + \caption{$\rho=\frac{1}{100}$} + \end{subfigure}~ + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{1000oostenrijk.png} + \caption{$\rho=\frac{1}{1000}$} + \end{subfigure} + \caption{Test image with a fraction $\rho$ of the coefficients left.} \label{fig:compression} - Make images with no compression, $c = \frac{1}{2}$, $c = \frac{1}{4}$ and $c = \frac{1}{8}$. -} +\end{figure} As the wavelet transform is invertible we can decompress $C$ to obtain a approximation of $X$. \subsection{Practical difficulties} -We made a big assumption in the previous subsection which does not hold at all in the real world. Namely that an image $X$ is a real valued $n \times m$-matrix. Most images are quantized to take values in $[0, 255]$, i.e. images consists of pixels of a single byte. This means that the size of an image $X$ is simply $nm$ bytes. Our algorithm only works for real valued data, so we can convert these bytes to the reals and perform our algorithm to obtain $Y$. In figure~\ref{fig:wavelet_distribution} we see how the values of $X$ and $Y$ are distributed. The values of $X$ are nicely distributed, whereas $Y$ has a totally different distribution. Also note that a lot of the coefficients are concentrated around $0$, this means that we can throw away a lot. However this blow-up from 1 byte to 8 bytes is still to big, so we would like to quantize the remaining values too. For a suitable quantization $f: \R -> [0, 255]$ the compressed image is now: -\[ C = \{ (f(y_{i,j}), i, j) \| 0 \leq i \leq n, 0 \leq j \leq m, |y_{i,j}| \leq \tau \}, \] +We made a big assumption in the previous subsection which does not hold at all in the real world. Namely that an image $X$ is a \emph{real valued} $n \times m$-matrix. Most images are quantized to take values in $[0, 255]$, i.e. images consists of pixels of a single byte. This means that the size of an image $X$ is simply $nm$ bytes. Our algorithm only works for real valued data, so we can convert these bytes to the reals and perform our algorithm to obtain $Y$. The values of $X$ are nicely distributed in $[0, 255]$, whereas $Y$ has a totally different distribution, most of it is concentrated around 0. This blow-up from 1 byte to 8 bytes is still to big, so we would like to quantize the values which we do not throw away. For a suitable quantization $f: \R -> [0, 255]$ the compressed image is now: +\[ C = \{ (f(y_{i,j}), i, j) \,|\, 0 \leq i \leq n, 0 \leq j \leq m, |y_{i,j}| \leq \tau \}, \] with a size of $9c$ instead of $mn$. In figure~\ref{fig:comrpession_algo} the different steps of the algorithm are depicted. -\todo{ - \label{fig:compression_algo} - Make images of the distribution of $X \to Y \to Y' \to f(Y')$. -} +The only thing left is to specify the quantization function $f$. We want it to have three properties. First of all we would like it to be invertible in order to decompress the image. Of course this is impossible, but we can ask for a function $g$ such that $|y - g(f(y))|$ is somewhat minimized. Secondly, we want $f$ to make the distribution of $Y$ more uniform. And at last we do not want $f$ to account the values we will throw away. In figure~\ref{fig:quantization} three such functions are considered. For the exact details we refer to the implementation. Note that a linear quantization performs very bad. -The only thing left is to specify the quantization function $f$. We want it to have three properties. First of all we would like it to be invertible in order to decompress the image. Of course this is impossible, but we can ask for a function $g$ such that $|y - g(f(y))|$ is somewhat minimized. Secondly, we want $f$ to make the distribution of $Y$ more uniform. And at last we do not want $f$ to account the values we will throw away. In figure~\ref{fig:quantization} three such functions are considered. We will explain the linear quantization and refer to the source code for the other two variants. +\begin{figure} + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{eindhoven.png} + \caption{Original} + \end{subfigure}~ + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{eindhoven_lin.png} + \caption{Linear} + \end{subfigure}~ + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{eindhoven_log.png} + \caption{Logarithmic} + \end{subfigure}~ + \begin{subfigure}[b]{0.25\textwidth} + \centering + \includegraphics[width=\textwidth]{eindhoven_sqrt.png} + \caption{Square root} + \end{subfigure} + \caption{Three different methods to quantize the coefficients.} + \label{fig:quantization} +\end{figure} -\todo{ - Explain linear quantization -} -\todo{ - \label{fig:quantization} - Make images with linear, squareroot and log quantization -} +\subsection{Parallel implementation} +We have already seen how to perform a two-dimensional wavelet transform in section~\ref{sec:dau}. By parallelizing the one-dimensional case we get one for two dimensions for free. However, we can do a bit better. There was a slight problem in the parallelization, namely that there was a dedicated processor which had to end the algorithm. With more data at hand we can now do this last step too in parallel. Let $X$ be an image of width $w$ and height $h$ and let $p$ be the number of processors. We divide the image into blocks of size $\frac{w}{p}\times\frac{h}{p}$. We will assign processors to these blocks as shown in figure~\ref{fig:par2d}. When doing the wavelet horizontally we will let processor 0 finish the first $\frac{w}{p}$ rows, processor 1 the second $\frac{w}{p}$ rows, and so on. Similarly for the columns. This assignment ensures that all processors will end the algorithm for some rows and columns, hence no processors will be idling. + +\begin{figure} + \centering + \includegraphics{2dpar} + \caption{Assignment of blocks to processors 0, 1, 2 and 3} + \label{fig:par2d} +\end{figure} + +The sequential cost of the algorithm is $28wh$, as we have to perform the wavelet in both directions. Most part of the analysis of the previous section still holds. Let us consider doing the wavelet horizontally for each row, instead of sending $C_m$ words in each step, we now have to send $\frac{h}{p}C_m$ words and so on. Almost every term in the cost analysis is multiplied by $\frac{h}{p}$, except for the factors of $l$, as we do not have to synchronize more often (we only have to send bigger chunks of data). Doing this in both directions yields the cost of the parallel two-dimensional wavelet transform: +\begin{align*} + T &= 14\frac{h}{p}D_m^H + 14\frac{h}{p}p + \frac{1}{m}\logt(\frac{w}{p})(\frac{h}{p}C_m^Hg+l) + 2(p-1)\frac{h}{p}g + 2l \quad\text{ (horizontal part)}\\ + &+ 14\frac{w}{p}D_m^V + 14\frac{w}{p}p + \frac{1}{m}\logt(\frac{h}{p})(\frac{w}{p}C_m^Vg+l) + 2(p-1)\frac{w}{p}g + 2l \quad\text{ (vertical part)} +\end{align*} +In particular, if $n=w=h$ and $m=1$, we can simplify this to: +\[ T = 28\frac{n^2}{p} + 28n + 2\logt(\frac{n}{p})(2\frac{n}{p}g+l) + 4(p-1)\frac{n}{p}g + 4l \text{ flops}, \] +compared to the $28n^2$ flops of the sequential algorithm. -In the next section we will see how well this compression performs in terms of running time. diff --git a/wavelet_report/intro.tex b/wavelet_report/intro.tex index 180ba68..c2162b5 100644 --- a/wavelet_report/intro.tex +++ b/wavelet_report/intro.tex @@ -17,12 +17,12 @@ In figure~\ref{fig:fourier_concepts} an input signal of length $128$ is expresse \begin{tabular}{c|c} \begin{subfigure}[b]{0.5\textwidth} \centering - \includegraphics[scale=0.9]{fourier_concept1} + \includegraphics[scale=0.8]{fourier_concept1} \caption{Representing a signal on the standard basis.} \end{subfigure}& \begin{subfigure}[b]{0.5\textwidth} \centering - \includegraphics[scale=0.9]{fourier_concept2} + \includegraphics[scale=0.8]{fourier_concept2} \caption{Representing a signal on the Fourier basis.} \end{subfigure} \end{tabular} @@ -32,26 +32,40 @@ In figure~\ref{fig:fourier_concepts} an input signal of length $128$ is expresse The figure also shows us that we might do compression based on these Fourier coefficients. Instead of storing all samples, we just store only a few coefficients from which we are able to approximate the original input. However there is a shortcoming to this. Consider the following scenario. A sensor far away detects a signal, transforms it and sends the Fourier coefficients to earth. During the transmission one of the coefficients is corrupted. This results in a wave across the whole signal. The error is \emph{non-local}. If, however, we decided to send the original samples, a corrupted sample would only affect a small part of the signal, i.e. the error is \emph{local}. This is illustrated in figure~\ref{fig:fourier_error}. -\todo{ - fig:fourier\_error - add $0.5 * e_10$ and $0.5 * f_10$ to both signals -} +\tikzstyle{plain_line}=[] +\begin{figure} + \begin{tabular}{c|c} + \begin{subfigure}[b]{0.5\textwidth} + \centering + \includegraphics[scale=0.8]{fourier_error1} + \caption{The signal with $e_{10}$ added.} + \end{subfigure}& + \begin{subfigure}[b]{0.5\textwidth} + \centering + \includegraphics[scale=0.8]{fourier_error2} + \caption{The signal with $f_{10}$ added.} + \end{subfigure} + \end{tabular} + \caption{If one coefficient is corrupted the results depends on the representation. In the fourier basis shuch an error is visible across the signal, whereas it is local in the standard basis.} + \label{fig:fourier_error} +\end{figure} \subsection{The simplest wavelet transform} -At the heart of the Fourier transform is the choice of the basis elements $f_i$. With a bit of creativity we can cook up different basis elements with different properties. To illustrate this we will have a quick look at the so-called \emph{Haar wavelets}. In our case where $n=128$ we can define the following $128$ elements: - -\begin{align} - h_0 &= \sum_{i=1}^{128} e_i, \\ - h_1 &= \sum_{i=1}^{64} e_i - \sum_{i=65}^{128} e_i, \\ - h_2 &= \sum_{i=1}^{32} e_i - \sum_{i=33}^{64} e_i, \\ - h_2 &= \sum_{i=65}^{96} e_i - \sum_{i=97}^{128} e_i, \\ - \ldots, & \\ - h_{2^n + j} = \sum_{i=2^{6-n}j+1}^{2^{6-n}(j+1)} e_i - \sum_{i=2^{6-n}(j+1)+1}^{2^{6-n}(j+2)} e_i \quad (j < 2^n). -\end{align} +At the heart of the Fourier transform is the choice of the basis elements $f_i$. With a bit of creativity we can cook up different basis elements with different properties. To illustrate this we will have a quick look at the so-called \emph{Haar wavelets}. These wavelets also form a basis. Some of these basis elements are depicted in figure~\ref{fig:haarwvlt}. The other basis elements are made by narrowing the ones shown. In contrast to the Fourier basis elements, these wavelets are not smooth at all. If again we add $h_{10}$ to our signal, only a small portion is altered. -We will refer to these elements as \emph{Haar wavelets}. To give a better feeling of these wavelets, some of them are plotted in figure~\ref{fig:haar_waveleta} on the standard basis. There is also an effective way to express a signal represented on the standard basis on this new basis. Again our example can be written on this new basis, and again we see that the first coefficient already approximates the signal and that the other coefficients refine it. - -To go back to our problem of noise, if we add $0.5*h_10$ to this signal, only a small part of the signal is disturbed as shown in figure~\ref{fig:haar_error}. +\begin{figure} + \centering + \begin{tikzpicture} + \begin{groupplot}[group style={group size=2 by 2}, clip=false, yticklabels={,,}, height=3cm, width=0.5\textwidth, xmin=0, xmax=128, ymin=-1.1, ymax=1.1, domain=0:128] + \nextgroupplot \addplot[plain_line] coordinates {(0,1) (128,1)}; \legend{$h_0$} + \nextgroupplot \addplot[plain_line] coordinates {(0,1) (64,1) (64,-1) (128,-1)}; \legend{$h_1$} + \nextgroupplot \addplot[plain_line] coordinates {(0,1) (32,1) (32,-1) (64,-1) (64,0) (128,0)}; \legend{$h_2$} + \nextgroupplot \addplot[plain_line] coordinates {(0,0) (64,0) (64,1) (96,1) (96,-1) (128,-1) (128,0)}; \legend{$h_3$} + \end{groupplot} + \end{tikzpicture} + \label{fig:haarwvlt} + \caption{Some of the haar wavelets.} +\end{figure} Another important difference is the way these basis elements can represent signals. With the Fourier basis elements we can easily approximate smooth signals, but with the Haar basis elements this is much harder. However representing a piecewise constant signal is almost trivial with the Haar wavelets. In photography the latter is preferred, as edges are very common (think of black branches of a tree against a clear sky). So depending on the application this \emph{non-smoothness} is either good or bad. diff --git a/wavelet_report/par.tex b/wavelet_report/par.tex index 1ab718e..d2865bd 100644 --- a/wavelet_report/par.tex +++ b/wavelet_report/par.tex @@ -20,14 +20,16 @@ for i=1 to b/2 i = i*2 \end{lstlisting} -We stop after $i=\frac{b}{2}=\frac{n}{2p}$ because for $i=b$ we would need three elements from three different processors. To continue, each processor has to send two elements to some dedicated processor (say processor zero). This processor then ends the algorithm by applying the wavelet transform of size $p$ (here $p$ needs to be a power of two). Note that we only have to do this when $p \geq 4$. The last part of the algorithm is given by: +We stop after $i=\frac{b}{2}=\frac{n}{2p}$ because for $i=b$ we would need three elements from three different processors. To continue, each processor has to send two elements to some dedicated processor (say processor zero). This processor then ends the algorithm by applying the wavelet transform of size $p$. The other processors are idling at this moment. Note that we only have to do this when $p \geq 4$. The last part of the algorithm is given by: \begin{lstlisting} +// fan in put x_{sb} in processor 0 if s = 0 wavelet(y, p) +// fan out x_{sb} <- get y_{2s} from processor 0 \end{lstlisting} @@ -71,6 +73,6 @@ if r > 0 \end{lstlisting} Again we end in the same style as above by letting a dedicated processor do a wavelet transform of length $p$. Note that for $m=1$ we get our first algorithm back and for $m = \logt(b)-1$ we get more or less our second algorithm back. The costs for this algorithm are given by: -\[ 14D_m + 14p + \frac{1}{m}\logt(\frac{n}{p})(2C_mg+l) + 2(p-1)g + 2l. \] +\[ 14D_m + 14p + \frac{1}{m}\logt(\frac{n}{p})(C_mg+l) + 2(p-1)g + 2l. \] In order to pick the right $m$ we should know the parameters of the machine. Comparing these running times is done in section~\ref{sec:res}, where these theoretical bounds are plotted together with real measurements. diff --git a/wavelet_report/res.tex b/wavelet_report/res.tex index d8c1534..fc7ea40 100644 --- a/wavelet_report/res.tex +++ b/wavelet_report/res.tex @@ -25,9 +25,9 @@ l & 1300 & 2161 & 46455 & 162761\\ When we measure time, we only measure the time of the actual algorithm. So we ignore start-up time or allocation time and initial data distribution. Time is measured with the \texttt{bsp\_time()} primitive, which is a wall clock. For a better measurement we iterated the algorithm at least 100 times and divided the total time by the number of iterations. \subsection{Results} -In this subsection we will plot the actual running time of the algorithm. We will take $n$ as a variable to see how the parallel algorithm scales. As we only allow power of two for $n$ we will often plot in a $\log-\log$-fashion. In all cases we took $n=2^6$ as a minimum and $n=2^27$ as a maximum. Unless stated otherwise we will use blue for the parallel running time, red for the sequential running time. The thin lines shows the theoretical time for which we used the variables in table~\ref{tab:variables}. +In this subsection we will plot the actual running time of the algorithm. We will take $n$ as a variable to see how the parallel algorithm scales. As we only allow power of two for $n$ we will often plot in a $\log-\log$-fashion. In all cases we took $n=2^6$ as a minimum and $n=2^{27}$ as a maximum. Unless stated otherwise we will use blue for the parallel running time, red for the sequential running time. The thin lines shows the theoretical time for which we used the variables in table~\ref{tab:variables}. -In figure~\ref{fig:basic} the running time is plotted for the case where $m=1$. There are multiple things to note. First of all we see that the actual running time closely matches the shape of the theoretical prediction. This assures us that the BSP cost model is sufficient to predict the impact of parallelization. However the theoretical prediction is too optimistic. +In figure~\ref{fig:basic} the running time is for two test cases with varying $n$. There are multiple things to note. First of all we see that the actual running time closely matches the shape of the theoretical prediction. This assures us that the BSP cost model is sufficient to predict the impact of parallelization. However the theoretical prediction is too optimistic. \tikzstyle{measured}=[mark=+] \tikzstyle{predicted}=[very thin] @@ -46,21 +46,53 @@ In figure~\ref{fig:basic} the running time is plotted for the case where $m=1$. \end{loglogaxis} \end{tikzpicture} - \caption{Running time on a MBP with $p=2$} + \caption{MBP with $p=2$ and $m=2$} \end{subfigure}~ \begin{subfigure}[b]{0.5\textwidth} \begin{tikzpicture} \begin{loglogaxis}[xlabel=n, width=\textwidth] - \addplot[predicted, sequential] table[x=n, y=SeqP] {results/cart_p4_m1_basic}; - \addplot[predicted, parallel] table[x=n, y=ParP] {results/cart_p4_m1_basic}; - \addplot[measured, sequential] table[x=n, y=Seq] {results/cart_p4_m1_basic}; \addlegendentry{sequential} - \addplot[measured, parallel] table[x=n, y=Par] {results/cart_p4_m1_basic}; \addlegendentry{parallel} + \addplot[predicted, sequential] table[x=n, y=SeqP] {results/cart_p4_m5_basic}; + \addplot[predicted, parallel] table[x=n, y=ParP] {results/cart_p4_m5_basic}; + \addplot[measured, sequential] table[x=n, y=Seq] {results/cart_p4_m5_basic}; \addlegendentry{sequential} + \addplot[measured, parallel] table[x=n, y=Par] {results/cart_p4_m5_basic}; \addlegendentry{parallel} \end{loglogaxis} \end{tikzpicture} - \caption{Running time on Cartesius with $p=4$} + \caption{Cartesius with $p=4$ and $m=5$} \end{subfigure} \caption{Running time versus number of elements $n$. The thin line shows the theoretical prediction.} \label{fig:basicplot} \end{figure} + +In figure~\ref{fig:speedup} we plotted the speedup of the parallel program. Here we divided the sequential run time by the parallel run time and let $p$ vary. We see that it scales very well up to $p = 16$. + +In all the results on cartesius we used $m=5$ without any argument or proof of optimality. Figure~\ref{fig:optm} shows the running time for various $m$ and we see that indeed $m=5$ gives a local minimum, both in the measurements and in the theoretical cost model. + +\begin{figure} + \centering + \begin{tikzpicture} + \begin{axis}[xlabel={$p$}, ylabel={Speedup}, width=0.6\textwidth, height=5cm, ymax = 12] + + \addplot[measured, very thick] table[x=p, y=Speedup] {results/cart_m5_speedup}; + \addplot[predicted] table[x=p, y=p] {results/cart_m5_speedup}; + + \end{axis} + \end{tikzpicture} + \caption{The speedup of the parallel algorithm on Cartesius. $n = 1048576$ and $m = 5$. The thin line is the theoretical maximum speedup.} + \label{fig:speedup} +\end{figure} + +\begin{figure} + \centering + \begin{tikzpicture} + \begin{axis}[xlabel={$m$}, ylabel={Time (s)}, width=0.6\textwidth, height=5cm] + + \addplot[measured, very thick] table[x=m, y=Time] {results/cart_p4_optm}; + \addplot[predicted] table[x=m, y=Pred] {results/cart_p4_optm}; + + \end{axis} + \end{tikzpicture} + \caption{The running time on Cartesius of the parallel algorithm for $n = 32768$, $p = 4$ and varying $m$. The thing line is the prediction.} + \label{fig:optm} +\end{figure} diff --git a/wavelet_report/results/cart_m5_speedup b/wavelet_report/results/cart_m5_speedup new file mode 100644 index 0000000..664c439 --- /dev/null +++ b/wavelet_report/results/cart_m5_speedup @@ -0,0 +1,8 @@ +p Time Speedup +1 0.546979 1 +2 0.274025 1.99609159748198 +4 0.142572 3.83651067530791 +8 0.08397 6.5139811837561 +16 0.050088 10.9203601661077 +32 0.19652 2.78332485243232 +64 0.41637 1.3136849436799 \ No newline at end of file diff --git a/wavelet_report/results/cart_p4_m5_basic b/wavelet_report/results/cart_p4_m5_basic new file mode 100644 index 0000000..b88f33f --- /dev/null +++ b/wavelet_report/results/cart_p4_m5_basic @@ -0,0 +1,24 @@ +n Seq Par SeqP ParP +64 0.00000026 0.00001622 0.00000013232905 0.000021498006203 +128 0.00000055 0.00001662 0.000000264658101 0.000022081671836 +256 0.00000108 0.00001771 0.000000529316201 0.000023249003101 +512 0.00000213 0.00002302 0.000001058632403 0.000025154570964 +1024 0.00000424 0.00002376 0.000002117264806 0.000027192467878 +2048 0.00000846 0.00002498 0.000004234529612 0.000029495022892 +4096 0.00001707 0.00002811 0.000008469059223 0.000032326894107 +8192 0.00003559 0.00003404 0.000016938118446 0.000036217397726 +16384 0.00007096 0.0000473 0.000033876236893 0.00004222516615 +32768 0.00015266 0.00006673 0.000067752473785 0.000052467464185 +65536 0.00033609 0.00010479 0.000135504947571 0.000071178821444 +131072 0.00067463 0.00019835 0.000271009895141 0.00010682829715 +262144 0.00134791 0.00038262 0.000542019790282 0.000176354009747 +524288 0.00273649 0.00072986 0.001084039580564 0.000313632196131 +1048576 0.00546979 0.00140784 0.002168079161128 0.000586415330084 +2097152 0.01101894 0.00277283 0.004336158322257 0.001130208359179 +4194304 0.02791576 0.00553617 0.008672316644513 0.002216021178556 +8388608 0.06440857 0.01453216 0.017344633289027 0.004385873578497 +16777216 0.1291075 0.03503878 0.034689266578054 0.008723805139566 +33554432 0.25794085 0.07025828 0.069378533156107 0.017397895022892 +67108864 0.5158101 0.14011133 0.138757066312214 0.034744301550731 +134217728 1.03011767 0.27893144 0.277514132624428 0.069435341367597 + diff --git a/wavelet_report/results/cart_p4_optm b/wavelet_report/results/cart_p4_optm new file mode 100644 index 0000000..8baad1a --- /dev/null +++ b/wavelet_report/results/cart_p4_optm @@ -0,0 +1,13 @@ +m Time Pred +1 0.00011114 0.00011417131886 +2 0.00008151 0.000073402451632 +3 0.00007189 0.000060344114606 +4 0.00006558 0.000054616009452 +5 0.00006637 0.000052467464185 +6 0.00006595 0.000053193324472 +7 0.00006873 0.00005743051248 +8 0.00007791 0.000067149239403 +9 0.00011053 0.000086395214887 +10 0.00014981 0.000122934396692 +11 0.00030648 0.000191463299365 +12 0.00049247 0.000319751144587 \ No newline at end of file