[Report] Started to generate some results

2014-01-17 17:03:52 +01:00 · 2014-01-17 17:03:52 +01:00 · 20b9b15f7a
commit 20b9b15f7a
parent bf6133d0e6
7 changed files with 153 additions and 1 deletions
--- a/filter.sh
+++ b/filter.sh
@ -0,0 +1,3 @@
 #!/bin/bash
 cat "$@" | grep "seq\|par" | sed 's/[a-zA-Z]//g' | sed 's/[[:blank:]]//g' | sed 'N;s/\n/	/'
--- a/wavelet_report/preamble.tex
+++ b/wavelet_report/preamble.tex
@ -5,6 +5,14 @@
 % floating figures
 \usepackage{float}
 \usepackage{tikz}
 \usepackage{pgfplots}
 \pgfplotsset{compat=newest}
 \usepackage{graphicx}
 \usepackage{caption}
 \usepackage{subcaption}
 % Matrices have a upper bound for its size
 \setcounter{MaxMatrixCols}{20}
@ -24,3 +32,5 @@
 \theoremstyle{plain}
 \newtheorem{theorem}{Theorem}[section]
 \newtheorem{lemma}[theorem]{Lemma}
 \newcommand*{\thead}[1]{\multicolumn{1}{c}{\bfseries #1}}
--- a/wavelet_report/report.tex
+++ b/wavelet_report/report.tex
@ -6,7 +6,7 @@
 \title{Parallel wavelet transform}
 \author{Joshua Moerman}
-\includeonly{dau}
+%\includeonly{dau}
 \begin{document}
@ -20,6 +20,7 @@ In this paper we will derive a parallel algorithm to perform a Daubechies wavele
 \include{intro}
 \include{dau}
 \include{par}
 \include{res}
 \nocite{*}
--- a/wavelet_report/res.tex
+++ b/wavelet_report/res.tex
@ -0,0 +1,67 @@
 \section{Results}
 \label{sec:res}
 \subsection{Methodology}
 The first step into measuring the gain of parallelization is to make sure the implementation is correct and to have a sequential baseline. The first implementation was a very naive and sequential implementation, which did a lot of data copying and shuffling. The a more performant sequential implementation was made, which provided the same output as the naive one. By using the inverse we assured that the implementation was correct. Only then the parallel version was made. It gives exactly the same outcome as the sequential version and is hence considered correct.
 We analysed the theoretical BSP cost, but this does not guarantee much about running the real program. By also estimating the BSP variables $r, g$ and $l$ we can see how well the theoretical analysis matches the practical running time. To estimate these variables we used the general benchmarking tool \texttt{bench} from the BSP Edupack\footnote{See \url{http://www.staff.science.uu.nl/~bisse101/Software/software.html}}.
 There are two machines on which the computation was run. First of all, a Mac book pro (MBP 13'' early 2001) with two physical processors. Due to hyperthreading it actually has four \emph{virtual} cores, but for a pure computation (where the pipeline should always be filled) we cannot expect a speed up of more than two. Secondly, the super computer Cartesius (with many much more cores). We should note that the MBP actually has shared memory which the BSP model does not use at all. The estimated BSP variables are listed in table ~\ref{tab:variables}.
 \begin{table}
 \begin{tabular}{c|r|r|r|r}
 & \thead{MBP} & \thead{MBP} & \thead{Cartesius} & \thead{Cartesius} \\
 \hline
 p	& 2 	& 4 	& 4 	& 16 	\\
 r	& 5993	& 2849	& 6771	& 6771	\\
 g	& 284 	& 248	& 219	& 340	\\
 l	& 1300	& 2161	& 46455	& 162761\\
 \end{tabular}
 \caption{The estimated BSP variables for the two machines. Estimated for a different number of processors.}
 \label{tab:variables}
 \end{table}
 When we measure time, we only measure the time of the actual algorithm. So we ignore start-up time or allocation time and initial data distribution. Time is measured with the \texttt{bsp\_time()} primitive, which is a wall clock. For a better measurement we iterated the algorithm at least 100 times and divided the total time by the number of iterations.
 \subsection{Results}
 In this subsection we will plot the actual running time of the algorithm. We will take $n$ as a variable to see how the parallel algorithm scales. As we only allow power of two for $n$ we will often plot in a $\log-\log$-fashion. In all cases we took $n=2^6$ as a minimum and $n=2^27$ as a maximum. Unless stated otherwise we will use blue for the parallel running time, red for the sequential running time. The thin lines shows the theoretical time for which we used the variables in table~\ref{tab:variables}.
 In figure~\ref{fig:basic} the running time is plotted for the case where $m=1$. There are multiple things to note. First of all we see that the actual running time closely matches the shape of the theoretical prediction. This assures us that the BSP cost model is sufficient to predict the impact of parallelization. On both machines there is a point at which the parallel algorithm is faster and stays faster. However, on the MBP at around $10^6$ both the sequential and parallel algorithm show a bump.
 \tikzstyle{measured}=[mark=+]
 \tikzstyle{predicted}=[very thin, dashed]
 \tikzstyle{sequential}=[color=red]
 \tikzstyle{parallel}=[color=blue]
 \begin{figure}
 	\centering
 	\begin{subfigure}[b]{0.5\textwidth}
 		\begin{tikzpicture}
 		\begin{loglogaxis}[xlabel={$n$}, ylabel={Time (s)}, width=\textwidth]
 		\addplot[predicted, sequential] table[x=n, y=SeqP] {results/mbp_p2_m1_basic};
 		\addplot[predicted, parallel]   table[x=n, y=ParP] {results/mbp_p2_m1_basic};
 		\addplot[measured, sequential]  table[x=n, y=Seq]  {results/mbp_p2_m1_basic}; \addlegendentry{sequential}
 		\addplot[measured, parallel]    table[x=n, y=Par]  {results/mbp_p2_m1_basic}; \addlegendentry{parallel}
 		\end{loglogaxis}
 		\end{tikzpicture}
 		\caption{Running time on a MBP with $p=2$}
 	\end{subfigure}~
 	\begin{subfigure}[b]{0.5\textwidth}
 		\begin{tikzpicture}
 		\begin{loglogaxis}[xlabel=n, width=\textwidth]
 		\addplot[predicted, sequential] table[x=n, y=SeqP] {results/cart_p4_m1_basic};
 		\addplot[predicted, parallel]   table[x=n, y=ParP] {results/cart_p4_m1_basic};
 		\addplot[measured, sequential]  table[x=n, y=Seq]  {results/cart_p4_m1_basic}; \addlegendentry{sequential}
 		\addplot[measured, parallel]    table[x=n, y=Par]  {results/cart_p4_m1_basic}; \addlegendentry{parallel}
 		\end{loglogaxis}
 		\end{tikzpicture}
 		\caption{Running time on Cartesius with $p=4$}
 	\end{subfigure}
 	\caption{Running time vs. number of elements $n$. The thin line shows the theoretical prediction.}
 	\label{fig:basicplot}
 \end{figure}
 bla
--- a/wavelet_report/results/cart_p4_m1_basic
+++ b/wavelet_report/results/cart_p4_m1_basic
@ -0,0 +1,24 @@
 n Seq Par SeqP ParP
 64	0.00000027	0.00002848	0.00000013232905	0.000024750184611
 128	0.00000054	0.00002891	0.000000264658101	0.000028310736966
 256	0.00000105	0.00003613	0.000000529316201	0.000031904371585
 512	0.0000025	0.0000375	0.000001058632403	0.000035564170728
 1024	0.00000417	0.00004423	0.000002117264806	0.000039356298922
 2048	0.00000829	0.0000438	0.000004234529612	0.000043413085216
 4096	0.000017	0.00005336	0.000008469059223	0.000047999187712
 8192	0.0000352	0.00005818	0.000016938118446	0.000053643922611
 16384	0.00007036	0.00007659	0.000033876236893	0.000061405922316
 32768	0.00014554	0.00009578	0.000067752473785	0.000073402451632
 65536	0.00033355	0.00014556	0.000135504947571	0.000093868040171
 131072	0.00066892	0.00022974	0.000271009895141	0.000131271747157
 262144	0.00133726	0.00041677	0.000542019790282	0.000202551691035
 524288	0.00271766	0.00075303	0.001084039580564	0.000341584108699
 1048576	0.00540122	0.00143222	0.002168079161128	0.000616121473933
 2097152	0.01085754	0.00280032	0.004336158322257	0.001161668734308
 4194304	0.02789499	0.00554204	0.008672316644513	0.002249235784965
 8388608	0.06382695	0.01450042	0.017344633289027	0.004420842416187
 16777216	0.1277917	0.0350954	0.034689266578054	0.008760528208536
 33554432	0.2550964	0.06986389	0.069378533156107	0.017436372323143
 67108864	0.50946262	0.1389675	0.138757066312214	0.034784533082263
 134217728	1.01779021	0.27657479	0.277514132624428	0.069477327130409
--- a/wavelet_report/results/mbp_p2_m1_basic
+++ b/wavelet_report/results/mbp_p2_m1_basic
@ -0,0 +1,23 @@
 n Seq Par SeqP ParP
 64	0.0000002	0.00000231	0.000000149507759	0.000001719339229
 128	0.000000485	0.000003695	0.000000299015518	0.000002044718839
 256	0.00000088	0.000003345	0.000000598031036	0.000002444852328
 512	0.00000167	0.00000448	0.000001196062072	0.000002994493576
 1024	0.000003305	0.00000637	0.000002392124145	0.000003843150342
 2048	0.000006615	0.000010065	0.00000478424829	0.000005289838145
 4096	0.000013705	0.000015575	0.000009568496579	0.000007932588019
 8192	0.00002897	0.000020225	0.000019136993159	0.000012967462039
 16384	0.00007088	0.00004938	0.000038273986317	0.000022786584348
 32768	0.000146195	0.00008591	0.000076547972635	0.000042174203237
 65536	0.000313405	0.000130035	0.000153095945269	0.000080698815284
 131072	0.000534205	0.0004309	0.000306191890539	0.000157497413649
 262144	0.001042505	0.000626695	0.000612383781078	0.000310843984649
 524288	0.002813735	0.002353855	0.001224767562156	0.000617286500918
 1048576	0.00747013	0.006598305	0.002449535124312	0.001229920907726
 2097152	0.01472155	0.01315899	0.004899070248623	0.002454939095612
 4194304	0.02943272	0.02627802	0.009798140497247	0.004904724845653
 8388608	0.058599735	0.055608795	0.019596280994494	0.009804045720007
 16777216	0.123395235	0.106270255	0.039192561988987	0.019602436842984
 33554432	0.238386745	0.21387985	0.078385123977974	0.039198968463207
 67108864	0.474814405	0.428788495	0.156770247955949	0.078391781077924
 134217728	0.953750485	0.856867835	0.313540495911897	0.156777155681629
--- a/wavelet_speed.sh
+++ b/wavelet_speed.sh
@ -0,0 +1,24 @@
 #!/bin/bash
 #SBATCH -t 0:30:00
 #SBATCH -n 4
 p=2
 start=6
 end=27
 iters=200
 if [[ `whoami` == "bissstud" ]]; then
 	cd $HOME/Students13/JoshuaMoerman/assignments
 	echo "Running on Cartesius $@"
 	RUNCOMMAND="srun"
 else
 	echo "Running locally $@"
 	RUNCOMMAND=""
 fi
 for i in `seq $start $end`; do
 	echo -e "\n\033[1;34mtime\t`date`\033[0;39m"
 	let "n=2**$i"
 	$RUNCOMMAND ./build-Release/wavelet/wavelet_parallel_mockup --m 1 --n $n --p $p --show-input --iterations $iters
 done
 echo -e "\n\033[1;31mtime\t`date`\033[0;39m"
		`@ -0,0 +1,3 @@`
							`#!/bin/bash`

							`cat "$@" \| grep "seq\\|par" \| sed 's/[a-zA-Z]//g' \| sed 's/[[:blank:]]//g' \| sed 'N;s/\n/ /'`