Joshua Moerman
11 years ago
8 changed files with 195 additions and 41 deletions
@ -0,0 +1,13 @@ |
|||||
|
|
||||
|
.PHONY: report |
||||
|
|
||||
|
# We don want to pollute the root dir, so we use a build dir
|
||||
|
# http://tex.stackexchange.com/questions/12686/how-do-i-run-bibtex-after-using-the-output-directory-flag-with-pdflatex-when-f
|
||||
|
report: |
||||
|
mkdir -p build |
||||
|
cp references.bib build/ |
||||
|
pdflatex -output-directory=build report.tex |
||||
|
cd build; bibtex report |
||||
|
pdflatex -output-directory=build report.tex |
||||
|
pdflatex -output-directory=build report.tex |
||||
|
cp build/report.pdf ./ |
@ -0,0 +1,64 @@ |
|||||
|
|
||||
|
\section{Daubechies wavelets} |
||||
|
\label{sec:dau} |
||||
|
We now have seen three different bases to represent signals: in the sample domain, in the Fourier domain and in the Haar wavelets domain. The all have different properties. We have reasoned that the Haar wavelets have nice properties regarding images; it is able to represent edges well and errors are local. However a little bit of smoothness is sometimes asked for (again in photography, think of a blue sky: it's white/blue on the bottom, darker on the top). This is exactly what the Daubechies wavelets of order four add. |
||||
|
|
||||
|
Instead of explicitly defining or showing the basis elements, we will directly describe the wavelet transform $W$.\footnote{Note that we didn't describe the transforms described in section~\ref{sec:intro}, as this section was motivational only.} In fact we will describe it as an algorithm, as our intent is to implement it. |
||||
|
|
||||
|
|
||||
|
\subsection{The Daubechies wavelet transform} |
||||
|
We will formulate the algorithm in terms of matrix multiplications \cite{numc}. Before we do so, we need the following constants: |
||||
|
\begin{align*} |
||||
|
c_0 &= \frac{1 + \sqrt{3}}{4 \sqrt{2}}, &\quad |
||||
|
c_1 &= \frac{3 + \sqrt{3}}{4 \sqrt{2}}, \\ |
||||
|
c_2 &= \frac{3 - \sqrt{3}}{4 \sqrt{2}}, &\quad |
||||
|
c_3 &= \frac{1 - \sqrt{3}}{4 \sqrt{2}}. |
||||
|
\end{align*} |
||||
|
|
||||
|
Now let $n$ be even, define the $n \times (n+2)$-matrix $W_n$ as follows (where a blank means $0$). |
||||
|
\[ W_n = |
||||
|
\begin{pmatrix} |
||||
|
c_0 & c_1 & c_2 & c_3 & & & & & & & & & \\ |
||||
|
c_3 & -c_2 & c_1 & -c_0 & & & & & & & & & \\ |
||||
|
& & c_0 & c_1 & c_2 & c_3 & & & & & & & \\ |
||||
|
& & c_3 & -c_2 & c_1 & -c_0 & & & & & & & \\ |
||||
|
|
||||
|
& & & & & & \ddots & & & & & & \\ |
||||
|
|
||||
|
& & & & & & & c_0 & c_1 & c_2 & c_3 & & \\ |
||||
|
& & & & & & & c_3 & -c_2 & c_1 & -c_0 & & \\ |
||||
|
& & & & & & & & & c_0 & c_1 & c_2 & c_3 \\ |
||||
|
& & & & & & & & & c_3 & -c_2 & c_1 & -c_0 |
||||
|
\end{pmatrix} \] |
||||
|
|
||||
|
We also need the \emph{even-odd sort matrix} $S_n$, defined by |
||||
|
\[ (S_n \vec{x})_i = \begin{cases} |
||||
|
x_{2i} &\mbox{ if } i < \frac{n}{2} \\ |
||||
|
x_{2i - n + 1} &\mbox{ if } i \geq \frac{n}{2}, |
||||
|
\end{cases}\] |
||||
|
which permutates the elements of $x$ by putting the elements with an even index in front. |
||||
|
|
||||
|
In many cases we want to apply the $n \times (n+2)$-matrix $W_n$ to a vector of length $n$, in order to do so we can set $x_n = x_0$ and $x_{n+1} = x_1$, i.e. we consider $\vec{x}$ to be \emph{periodic}. More precisely we can define a linear map $P_n$ as follows. |
||||
|
\[ P_n \vec{x} = (x_0, \ldots, x_{n-1}, x_0, x_1) \] |
||||
|
Now applying $W_n$ to the periodic $\vec{x}$ is exactly $W_n P_n \vec{x}$. |
||||
|
|
||||
|
The wavelet transform now consists of multiplying the above matrices in a recursive fashion. Given a vector $\vec{x}$ of length $n$, calculate $\vec{x}^{(1)} = S_n W_n P_n \vec{x}$, and recurse on the first halve of $\vec{x^1}$ using $S_\frac{n}{2}$, $W_\frac{n}{2}$ and $P_\frac{n}{2}$. Repeat this procedure and end with the muliplication of $S_4 W_4 P_4$. More formally the wavelet transform is given by: |
||||
|
\[ W \vec{x} := \diag(S_4 W_4 P_4, I_4, \ldots, I_4) |
||||
|
\diag(S_8 W_8 P_8, I_8, \ldots, I_8) \cdots |
||||
|
\diag(S_\frac{n}{2} W_\frac{n}{2} P_\frac{n}{2}, I_\frac{n}{2}) |
||||
|
S_n W_n P_n \vec{x}. \] |
||||
|
|
||||
|
|
||||
|
\subsection{In place} |
||||
|
When implementing this transform, we don't have to perform the even-odd sort. Instead, we can simply do all calculations in place and use a stride to do the recursion on the even part. This will permute the original result. |
||||
|
|
||||
|
|
||||
|
\subsection{Costs} |
||||
|
We will briefly analyze the cost of the transform by counting the number of \emph{flops}, that is muliplications and additions. Computing on element of $W_n \vec{x}$ costs $4$ multiplications and $3$ additions. So $W_n \vec{x}$ costs $7n$ flops. Applying $S_n$ and $P_n$ do not require any flops, as they are mere data manipulations. Consequently computing $W \vec{x}$ costs |
||||
|
\[ 7 \times n + 7 \times \frac{n}{2} + \cdots + 7 \times 8 + 7 \times 4 \text{ flops }. \] |
||||
|
Using the geometric series $\sum_{i=0}^\infty 2^{-i} = 2$ we can bound the number of flops by $14n$. |
||||
|
|
||||
|
Compared to the FFT this is a big improvement in terms of scalability, as this wavelet transform has a linear complexity $\BigO{n}$, but the FFT has a complexity of $\BigO{n \log n}$. This is however also a burden, as it leaves not much room for overhead induced by parallelizing the algorithm. We will see an precies analysis of communication costs in section~\ref{sec:par}. |
||||
|
|
||||
|
|
||||
|
\subsection{The inverse} |
@ -0,0 +1,41 @@ |
|||||
|
|
||||
|
\section{Introduction} |
||||
|
\label{sec:intro} |
||||
|
In this section we will motivate the need for wavelets. We will start with the well known Fourier transform and discuss things we can change. As an example we will be using a 1-dimensional signal of length $128$. This section will be a bit informal and will not focus on algorithms. |
||||
|
|
||||
|
\subsection{Recalling the Fourier transform} |
||||
|
Recall the Fourier transform; given an input signal $x = \sum_{i=1}^{128} x_i e_i$ (written on the standard basis) we can compute Fourier coefficients $x'_i$ such that $x = \sum_{i=1}^{128} x'_i f_i$. As we're not interested in the mathematics behind this transform, we will not specify $f_i$. Conceptually the Fourier transform is a basis transformation: |
||||
|
|
||||
|
$$ SampleDomain \to FourierDomain. $$ |
||||
|
|
||||
|
Furthermore this transformation has an inverse. Applications of this transform consist of going to the Fourier domain, applying some (easy to compute) function there and go back to sample domain again. |
||||
|
|
||||
|
In figure~\ref{fig:fourier_concepts} we've written an input signal of length $128$ on the standard basis, and on the Fourier basis (simplified, for illustrational purposes). We see that this signal is better expressed in the Fourier domain, as we only need three coefficients instead of all $128$. |
||||
|
|
||||
|
\todo{ |
||||
|
fig:fourier\_concepts |
||||
|
spelling out a sum of basis elements in both domains |
||||
|
} |
||||
|
|
||||
|
We see that we might even do compression based on the Fourier coefficients. Instead of sending all samples, we just only a few coefficients from which we are able to approximate the original input. However there is a shortcoming to this. Consider the following scenario. A sensor on Mars detects a signal, transforms it and sends the coefficients to earth. During the transmission one of the coefficients is corrupted. This results in a wave across the whole signal. The error is \emph{non-local}. If, however, we decided to send the original samples, a corrupted sample would only affect a small part of the signal, i.e. the error is \emph{local}. This is illustrated in figure~\ref{fig:fourier_error}. |
||||
|
|
||||
|
\todo{ |
||||
|
fig:fourier\_error |
||||
|
add $0.5 * e_10$ and $0.5 * f_10$ to both signals |
||||
|
} |
||||
|
|
||||
|
|
||||
|
\subsection{The simplest wavelet transform} |
||||
|
At the heart of the Fourier transform is the choice of the basis elements $f_i$. With a bit of creativity we can cook up different basis elements with different properties. To illustrate this we will have a quick look at the so-called ``Haar wavelets''. In our case where $n=128$ we can define the following $128$ elements: |
||||
|
|
||||
|
$$ h_0 = \sum_{i=1}^{128} e_i, |
||||
|
h_1 = \sum_{i=1}^{64} e_i - \sum_{i=65}^{128} e_i, |
||||
|
h_2 = \sum_{i=1}^{32} e_i - \sum_{i=33}^{64} e_i, |
||||
|
h_2 = \sum_{i=65}^{96} e_i - \sum_{i=97}^{128} e_i, \ldots, |
||||
|
h_{2^n + j} = \sum_{i=2^{6-n}j+1}^{2^{6-n}(j+1)} e_i - \sum_{i=2^{6-n}(j+1)+1}^{2^{6-n}(j+2)} e_i (j < 2^n) $$ |
||||
|
|
||||
|
We will refer to these elements as \emph{Haar wavelets}. To give a better feeling of these wavelets, they are plotted in figure~\ref{fig:haar_waveleta} on the standard basis. There is also an effective way to write an element written in the standard basis on this new basis, this is the Haar wavelet transform. Again our example can be written on this new basis, and again we see that the first coefficient already approximates the signal and that the other coefficients refine it. |
||||
|
|
||||
|
To go back to our problem of noise, if we add $0.5*h_9$ (there is a shift of indices) to this signal, only a small part of the signal is disturbed as shown in figure~\ref{fig:haar_error}. |
||||
|
|
||||
|
Another important difference is the way these basis elements can represent signals. With the Fourier basis elements we can easily approximate smooth signals, but with the Haar basis elements this is much harder. However representing a piecewise constant signal is easier with the Haar wavelets. In photography the latter is preferred, as edges are very common (think of branches of a tree against a clear sky or hard edges of a building). So depending on the application this \emph{non-smoothness} is either good or bad. |
@ -0,0 +1,13 @@ |
|||||
|
|
||||
|
\section{Parallelization of DAU4} |
||||
|
\label{sec:par} |
||||
|
|
||||
|
In this section we will look at how we can parallelize the Daubechies wavelet transform. We will first discuss a naive, and simple solution in which we communicate at every step. Secondly, we look at a solution which only communicates once. |
||||
|
|
||||
|
By analysing the BSP-costs we see that, depending on the machine, both solutions can be more performant than the other. At last we will derive a hybrid solution, which can dynamically choose the best solution depending on the machine. |
||||
|
|
||||
|
We already assumed the input size $n$ to be a power of two. We now additionally assume the number of processors $p$ is a power of two and (much) less than $n$. In all the given solutions we use a block distribution, each block thus contains $b = \frac{n}{p}$ elements (and is also a power of two). |
||||
|
|
||||
|
\subsection{Many communications steps} |
||||
|
The data $\vec{x} = x_0, \ldots, x_{n-1}$ is distributed among the processors with a block distribution, so processor $\proc{s}$ has the elements $\vec{x'} = x_{sb}, \ldots, x_{sb+b-1}$. At the first step of the algorithm we want to compute $W_b x'$, but we need to more elements in order to do so. |
||||
|
|
@ -0,0 +1,22 @@ |
|||||
|
|
||||
|
% clickable tocs |
||||
|
\usepackage{hyperref} |
||||
|
|
||||
|
% floating figures |
||||
|
\usepackage{float} |
||||
|
|
||||
|
% Matrices have a upper bound for its size |
||||
|
\setcounter{MaxMatrixCols}{20} |
||||
|
|
||||
|
% Remove trailing `contents` after toc |
||||
|
\renewcommand{\contentsname}{} |
||||
|
|
||||
|
\DeclareMathOperator{\diag}{diag} |
||||
|
% \newcommand{\vec}[1]{\mathbf{#1}} |
||||
|
\newcommand{\BigO}[1]{\mathcal{O}(#1)} |
||||
|
\newcommand{\proc}[1]{#1} |
||||
|
|
||||
|
\newcommand{\todo}[1]{ |
||||
|
\addcontentsline{tdo}{todo}{\protect{#1}} |
||||
|
$\ast$ \marginpar{\tiny $\ast$ #1} |
||||
|
} |
@ -0,0 +1,14 @@ |
|||||
|
|
||||
|
@misc{numc, |
||||
|
title={Numerical Recipes in C: The Art of Scientific Computing}, |
||||
|
author={Press, William H and Teukolsky, Saul A and Vetterling, William T and Flannery, Brian P}, |
||||
|
year={1992}, |
||||
|
publisher={Cambridge Univ. Press} |
||||
|
} |
||||
|
|
||||
|
@book{biss, |
||||
|
title={Parallel scientific computation}, |
||||
|
author={Bisseling, Rob H}, |
||||
|
year={2004}, |
||||
|
publisher={Oxford University Press Oxford} |
||||
|
} |
@ -1,52 +1,30 @@ |
|||||
|
\documentclass[a4paper, 11pt]{amsart} |
||||
|
|
||||
\begin{abstract} |
\input{style} |
||||
In this paper we will derive a parallel algorithm to perform a Daubechies wavelet transform of order four (DAU4). To conceptualize this transform we will first look into the Fourier transform to motivate first of all why we want such a transform and secondly to point out one of the shortcomings of the Fourier transform. After this introduction we will derive mathematical properties of the Daubechies wavelet transform, this mathematical description will also give us a naive sequential algorithm. By looking at which data is needed by which processor, we can give a parallel algorithm. As an application we will look into image compression using this wavelet transform. |
\input{preamble} |
||||
\end{abstract} |
|
||||
|
|
||||
|
|
||||
\section{Introduction} |
|
||||
\label{sec:intro} |
|
||||
In this section we will motivate the need for wavelets. We will start with the well known Fourier transform and discuss things we can change. As an example we will be using a 1-dimensional signal of length $128$. This section will be a bit informal and will not focus on algorithms. |
|
||||
|
|
||||
\subsection{Recalling the Fourier transform} |
|
||||
Recall the Fourier transform; given an input signal $x = \Sum_{i=1}^{128} x_i e_i$ (written on the standard basis) we can compute Fourier coefficients $x'_i$ such that $x = \Sum_{i=1}^{128} x'_i f_i$. As we're not interested in the mathematics behind this transform, we will not specify $f_i$. Conceptually the Fourier transform is a basis transformation: |
|
||||
|
|
||||
$$ SampleDomain \to FourierDomain. $$ |
|
||||
|
|
||||
Furthermore this transformation has an inverse. Applications of this transform consist of going to the Fourier domain, applying some (easy to compute) function there and go back to sample domain again. |
|
||||
|
|
||||
In figure~\ref{fig:fourier_concepts} we've written an input signal of length $128$ on the standard basis, and on the Fourier basis (simplified, for illustrational purposes). We see that this signal is better expressed in the Fourier domain, as we only need three coefficients instead of all $128$. |
\title{Parallel wavelet transform} |
||||
|
\author{Joshua Moerman} |
||||
|
|
||||
% fig:fourier_concepts |
\includeonly{dau} |
||||
% spelling out a sum of basis elements in both domains |
\begin{document} |
||||
|
|
||||
We see that we might even do compression based on the Fourier coefficients. Instead of sending all samples, we just only a few coefficients from which we are able to approximate the original input. However there is a shortcoming to this. Consider the following scenario. A sensor on Mars detects a signal, transforms it and sends the coefficients to earth. During the transmission one of the coefficients is corrupted. This results in a wave across the whole signal. The error is \emph{non-local}. If, however, we decided to send the original samples, a corrupted sample would only affect a small part of the signal, i.e. the error is \emph{local}. This is illustrated in figure~\ref{fig:fourier_error}. |
|
||||
|
|
||||
% fig:fourier_error |
\begin{abstract} |
||||
% add 0.5 * e_10 and 0.5 * f_10 to both signals |
In this paper we will derive a parallel algorithm to perform a Daubechies wavelet transform of order four (DAU4). To conceptualize this transform we will first look into the Fourier transform to motivate first of all why we want such a transform and secondly to point out one of the shortcomings of the Fourier transform. After this introduction we will derive mathematical properties of the Daubechies wavelet transform, this mathematical description will also give us a naive sequential algorithm. By looking at which data is needed by which processor, we can give a parallel algorithm. As an application we will look into image compression using this wavelet transform. |
||||
|
\end{abstract} |
||||
|
\maketitle |
||||
\subsection{The simplest wavelet transform} |
\tableofcontents |
||||
At the heart of the Fourier transform is the choice of the basis elements $f_i$. With a bit of creativity we can cook up different basis elements with different properties. To illustrate this we will have a quick look at the so-called ``Haar wavelets''. In our case where $n=128$ we can define the following $128$ elements: |
|
||||
|
|
||||
$$ h_0 = \Sum_{i=1}^{128} e_i, |
|
||||
h_1 = \Sum_{i=1}^{64} e_i - \Sum_{i=65}^{128} e_i, |
|
||||
h_2 = \Sum_{i=1}^{32} e_i - \Sum_{i=33}^{64} e_i, |
|
||||
h_2 = \Sum_{i=65}^{96} e_i - \Sum_{i=97}^{128} e_i, \ldots, |
|
||||
h_{2^n + j} = \Sum_{i=2^{6-n}j+1}^{2^{6-n}(j+1)} e_i - \Sum_{i=2^{6-n}(j+1)+1}^{2^{6-n}(j+2)} e_i (j < 2^n) $$ |
|
||||
|
|
||||
We will refer to these elements as \emph{Haar wavelets}. To give a better feeling of these wavelets, they are plotted in figure~\ref{fig:haar_waveleta} on the standard basis. There is also an effective way to write an element written in the standard basis on this new basis, this is the Haar wavelet transform. Again our example can be written on this new basis, and again we see that the first coefficient already approximates the signal and that the other coefficients refine it. |
|
||||
|
|
||||
To go back to our problem of noise, if we add $0.5*h_9$ (there is a shift of indices) to this signal, only a small part of the signal is disturbed as shown in figure~\ref{fig:haar_error}. |
|
||||
|
|
||||
Another important difference is the way these basis elements can represent signals. With the Fourier basis elements we can easily approximate smooth signals, but with the Haar basis elements this is much harder. However representing a piecewise constant signal is easier with the Haar wavelets. In photography the latter is preferred, as edges are very common (think of branches of a tree against a clear sky or hard edges of a building). So depending on the application this \emph{non-smoothness} is either good or bad. |
|
||||
|
|
||||
|
\include{intro} |
||||
|
\include{dau} |
||||
|
\include{par} |
||||
|
|
||||
\section{Daubechies wavelets} |
|
||||
\label{sec:dau} |
|
||||
We now have seen three different bases to represent signals: in the sample domain, in the Fourier domain and in the Haar wavelets domain. The all have different properties. We have reasoned that the Haar wavelets have nice properties regarding images; it is able to represent edges well and errors are local. However a little bit of smoothness is sometimes asked for (again in photography, think of a blue sky: it's white/blue on the bottom, darker on the top). This is exactly what the Daubechies wavelets of order four add. |
|
||||
|
|
||||
Instead of explicitly defining or showing the basis elements, we will directly describe the wavelet transform.\footnote{Note that we didn't describe the transforms described in section~\ref{sec:intro}, as this section was motivational only.} In fact we will describe it as an algorithm, as our intent is to implement it. |
\nocite{*} |
||||
|
\bibliographystyle{alpha} |
||||
|
\bibliography{references}{} |
||||
|
|
||||
\subsection{The Daubechies wavelet transform} |
|
||||
|
|
||||
|
\end{document} |
@ -0,0 +1,9 @@ |
|||||
|
|
||||
|
% lesser margins |
||||
|
\usepackage{geometry} |
||||
|
\geometry{a4paper} |
||||
|
\geometry{twoside=false} |
||||
|
|
||||
|
% no indent, but vertical spacing |
||||
|
\usepackage[parfill]{parskip} |
||||
|
\setlength{\marginparwidth}{2cm} |
Reference in new issue