\documentclass{ib-assignment}

\title{Optimizing Boggle boards}
\subtitle{An evaluation of parallelizable techniques}
\author{Ankur Dave}
\email{ankurdave@gmail.com}
\candidatenum{0844-028}
\wordcount{3611}
\date{January 22, 2009}
\updateheaders

\usepackage{setspace}

\usepackage{placeins}

\usepackage{color}
\definecolor{codeBrown}{rgb}{0.65,0.16,0.16}
\usepackage{listings}
\lstset{basicstyle=\ttfamily\small, numbers=left, %
		numberstyle=\footnotesize, stepnumber=1, numbersep=5pt, %
		backgroundcolor=\color{white}, showspaces=false, %
		showstringspaces=false, showtabs=false, frame=single, %
		tabsize=4, captionpos=b, breaklines=true, breakatwhitespace=true, %
		keywordstyle=\color{codeBrown}, commentstyle=\rm, %
		columns=fullflexible, frame=l, breakindent=40pt}
\lstdefinelanguage{pseudocode}{morekeywords={class, for, each, in, calculate, from, in, return, void, if, else, true, is, an, adjacent, false, file, contains, starting, with, a, whose, the, matching, of, this, constant, while, step}, morestring=[b]", morecomment=[l]{//}}
\lstset{escapeinside={/*@}{@*/}}

\newcommand{\url}[1]{$<$\texttt{#1}$>$}
\newcommand{\func}[1]{\texttt{#1}}
\newcommand{\class}[1]{\texttt{#1}}
\newcommand{\method}[2]{\texttt{#1.#2}}
\newcommand{\note}[1]{\textit{#1}}
\newcommand{\coderef}[2]{\ref{#1}, page \pageref{#1:#2}, line \ref{#1:#2}}

\begin{document}

\pagenumbering{roman}

\maketitle

\section*{Abstract}

This paper's objective is to find efficient, parallelizable techniques of solving global optimization problems. To do this, it uses the specific problem of optimizing the score of a Boggle board.

Global optimization problems deal with maximizing or minimizing a given function. This has many practical applications, including maximizing profit or performance, or minimizing raw materials or cost.

Parallelization is splitting up an algorithm across many different processors in a way that allows many pieces of work to run simultaneously. As parallel hardware increases in popularity and decreases in cost, algorithms should be parallelizable to maximize efficiency.

Boggle is a board game in which lettered cubes are shaken onto a 4-by-4 grid. The objective is to find words spelled by paths through the grid. The function to maximize is the sum of the scores of all possible words in the board.

In this paper, the performance of two algorithms for global optimization are investigated: hill climbing and genetic algorithms. Genetic algorithms, which model evolution to find the fittest solutions, are found to be more efficient because they are non-greedy. In addition, a modified genetic algorithm called the coarse-grained distributed genetic algorithm (DGA) is investigated. This algorithm can take advantage of multiple computers, running several semi-independent copies of the algorithm in parallel to provide extra genetic diversity and better performance. The success of the coarse-grained DGA shows that global optimization problems can benefit significantly from parallelization.

Investigating these genetic algorithms revealed several modifications that are beneficial to global optimization. These modifications solve the problem of premature convergence (a loss of genetic diversity). Several techniques to solve this problem are investigated, notably incest prevention and migration control, revealing a very significant performance increase.

\newpage

\tableofcontents

\newpage
\FloatBarrier
\pagenumbering{arabic}

% The research question is clearly stated in the introduction and sharply focused, making effective treatment possible within the word limit.

% The context of the research question is clearly demonstrated. The introduction clearly explains the significance of the topic and why it is worthy of investigation.

\section{Introduction}

In recent years, parallel computing has seen an enormous rise in interest. While supercomputers have used multiple processors for years, the rise of multi-core processors in desktop computers is a fairly recent phenomenon \cite{parallel}. In addition, Google-style cluster computing, harnessing thousands of commodity computers to do the work of a few high-end servers at a much lower cost, is rising in popularity \cite{google}. In order to take advantage of these new developments, computer scientists who need to solve problems must find ways to do so that lend themselves to parallelization---being split up across many different processors in a way that allows every piece of work to run simultaneously.

Global optimization problems are one class of problems in computer science that can benefit from the rise of parallel computing. Global optimization problems deal with finding the solution among a set of all possible solutions that maximizes a given function \cite{glopt-wolfram}. The set of all possible solutions is called the search space, and the function to evaluate each possible solution is called the objective function \cite{glopt-wp}. Global optimization problems have many practical applications; any problem that benefits from the maximization or minimization of a value---for example, raw materials used or cost---uses a optimization algorithm. % TODO maybe go into more detail about the applications of global opt

Such applications of global optimization problems are usually limited by the amount of time taken to find the optimal solution to the problem \cite{glopt-wp}, so they must be interrupted after a specified time has elapsed and the best solution so far must be used. Therefore, faster techniques for solving global optimization problems would benefit users of such problems, because they would allow better solutions to be found in the same amount of time. If the algorithm for solving a global optimization problem were able to take advantage of parallelization, its speed would be greatly improved.

The objective of this paper is to find efficient, parallelizable techniques of solving global optimization problems. In order to test such techniques, a specific global optimization problem is used: the optimization of the score of a Boggle board.

Boggle is a board game published by Hasbro in which 16 cubes, each having a letter printed onto every side, are shaken onto a 4-by-4 grid. The objective of the game is to find words that are spelled out by paths through the grid. These paths must take only horizontally, vertically, or diagonally adjacent steps, and they must not use a cube more than once for the same word. Longer words score more points; the detailed scoring rules are given in Table \vref{tbl:score}.
\begin{table}
	\begin{center}
		\begin{tabular}{r | l}
			Word length & Points \\
			\hline
			$\le$ 3 & 0 \\
			4 & 1 \\
			5 & 2 \\
			6 & 3 \\
			7 & 5 \\
			$\ge$ 8 & 11
		\end{tabular}
	\end{center}
	\caption{Boggle scoring rules}
	\label{tbl:score}
\end{table}

Every Boggle board has a score that is calculated by finding all the possible words in the board and summing the scores of each of the words.

In order to frame the Boggle board problem as a global optimization problem, it is necessary to specify the key parameters of global optimization problems: the format of possible solutions, the search space of possible solutions, and the objective function with which to measure solutions. In the case of the Boggle board problem, any Boggle board is a possible solution. The search space is therefore the set of all possible Boggle boards, and the objective function---the function that we wish to maximize---is the score of a Boggle board.

In order to understand why optimizing the score of a Boggle board is not a trivial problem, it is helpful to look at the search space. A Boggle board has 16 cubes. For simplicity of implementation each cube is represented by a random letter from the alphabet rather than one of the six letters on the actual cube in that position. Therefore, each letter can be one of the 26 letters in the English alphabet. This means that there are $26^{16}$ (about 44 sextillion, or about $2^{75}$) possible Boggle boards. If each board takes 0.01 second to score, then it would take over 13 trillion years to score all the boards in order to find the highest-scoring one. Clearly, any algorithm to optimize the score of a Boggle board must take a more intelligent approach to the problem than simply scoring every single board. Section \ref{sec:optimize} discusses the pursuit of such a technique.

An efficient and parallelizable technique for optimizing the score of a Boggle board would yield insights that would allow many other global optimization problems more efficiently to take advantage of the increasingly common and powerful resource of parallel computing.

\section{Boggle board data format}

In order to optimize the score of Boggle boards, the first step is to define the storage format for any Boggle board. These boards can be represented most clearly as objects, so throughout this paper, Java, an object-oriented language, is used. The Boggle board is stored as a \class{Board} object (\ref{Board.java}) that has a two-dimensional array of \class{Letter}s that stores the information for each letter on the board, a value for the score of the board, a value for the side length of the board (4 in the official version of Boggle), a list of words contained in the board, and a reference to a dictionary.

The score of the board is calculated by finding the sum of the scores of all the possible words in the board (\coderef{Board.java}{generate}). To find all possible words in the board, a depth-first search algorithm is used. This algorithm starts at the first letter on the board and recursively explores as far as possible along one path, stopping and backtracking when there are no letters left. Because using the same letter more than once in the same word is not allowed, each letter has a flag indicating whether or not it has been already used in the current word; this flag is set before traversing through a letter and lowered after finishing traversal through the letter. To improve performance, the dictionary used to check for valid words is stored as a trie---a tree in which each node is a letter and words in the dictionary are represented by paths through the tree (\ref{Dictionary.java}). This provides the benefit that dictionary lookups take constant time instead of logarithmic time, as they would with a flat dictionary. Pruning of the search tree (skipping traversal of branches that clearly will not contain any valid words) is also done using the trie to see if the current path begins any valid words (\coderef{Dictionary.java}{beginsWord}).

\section{Techniques for optimizing Boggle boards} \label{sec:optimize}

Optimization algorithms fall into two classes: greedy and non-greedy. This section investigates stochastic hill climbing, a greedy algorithm, and genetic algorithms, a greedy algorithm. After finding that non-greedy algorithms are a better choice for optimization problems, it explores whether distributed algorithms provide an improvement over non-distributed algorithms for optimization.

\subsection{Testing method} \label{sec:testing}

Each algorithm is run through 100 trials. In each trial, the algorithm is allowed to run until one of the following conditions is met:
\begin{itemize}
	\item the Boggle board score reaches 3500, or
	\item 1000 seconds elapse.
\end{itemize}

Non-distributed algorithms (those that run on only one computer) are tested on a Dell Precision M90 with the following specifications:
\begin{description}
	\item[CPU] Intel Core 2 Duo T7200, 2.00 GHz
	\item[RAM] 2.00 GB
	\item[OS] Ubuntu Linux 8.10
\end{description}

% TODO get specs for other computers

% TODO as part of hill climbing intro (or optimizing intro) talk about the fact that this is really stochastic hill climbing and why you chose it
\subsection{Hill climbing} \label{sec:hillclimb}

%TODO cite
A simple starting point for the task of finding high-scoring Boggle boards is hill climbing, an algorithm that starts with a random board and makes random, small changes, called mutation. If a change improves the board's score, it keeps the change; otherwise, it tries a different change. Hill climbing for Boggle is implemented in \ref{HillClimber.java}.

Hill climbing is a greedy algorithm, which means it only makes changes that improve the score of the Boggle board. This can cause it to get stuck in a local optimum. If there is a high-scoring Boggle board with no higher-scoring boards that are a small change from the current board, then the hill climbing algorithm will not be able to improve on the current board.

The implementation of hill climbing to optimize Boggle boards can be found in \ref{HillClimber.java}. This implementation was run according to the standard testing method in \ref{sec:testing}. To investigate the impact of the fact that hill climbing is a greedy algorithm, it is necessary to check which halting condition has happened first---the score has reached 3500, or 1000 seconds have elapsed. If the first halting condition is satisfied first, it means that the algorithm has performed well, and the time taken to reach the score of 3500 provides a means to compare hill climbing with other algorithms. If the second halting condition is satisfied first, however, it means that the algorithm has likely reached a local optimum---a consequence of the fact that hill climbing is a greedy algorithm---and will not provide any more improvement in the Boggle board score.

% TODO write analysis of trials

The low percentage of trials that reached a score of 3500---the success rate of the algorithm---is likely caused by the algorithm reaching local optima and ceasing to improve the score. This mediocre overall performance because of local optima is due to the fact that hill climbing is a greedy algorithm. To overcome it, it is necessary to choose an algorithm that is non-greedy---willing to follow paths that do not, in the short term, improve the Boggle board's score. This will allow the algorithm to escape local optima in search of the global optimum. One such algorithm is the genetic algorithm.

\subsection{Genetic algorithm}

%TODO explain genetic algorithm in much more detail. Cite some textbooks too. Also talk about parallelization

A different algorithm for optimizing the score of a Boggle board is the genetic algorithm. The genetic algorithm is a non-greedy algorithm, so it is not restricted to following paths that always improve the Boggle board score. It is able to take ``risks'' in order to escape local optima.

Genetic algorithms attempt to solve optimization problems by modeling evolution. They simulating a population of candidate solutions, called chromosomes. The population goes through multiple generations. In each generation the score, or fitness, of each chromosome is calculated. The more fit a chromosome, the more likely it is to be selected to reproduce.

After selecting a portion of the population to reproduce, the chromosomes in the population are paired up and combined using the crossover and mutation operators. The crossover operator takes parts of each parent chromosome and joins them to form an offspring. % TODO explain crossover in detail
The mutation operator adds an element of randomness to the simulation to help escape local optima. It is helpful when most of the chromosomes in the population are similar; in this situation, it introduces variety to continue improving the score. It randomly replaces a part of the offspring chromosome with a random piece of genetic material.

\subsubsection{Single-population genetic algorithm}

The basic version of the genetic algorithm works as described above, with no modifications made to allow it to run on multiple computers or take advantage of multiple cores in a computer. Beginning with the simple, non-parallelized version reveals whether parallelizing the algorithm yields significant improvements or not.

The implementation for the single-population genetic algorithm is contained in the \class{Population} class (\ref{Population.java}). This class stores the 
current population of Boggles, and each time the \method{Population}{evolve} method (\coderef{Population.java}{evolve}) is called, it advances the population 
to the next generation by pairing up parents, creating children using crossover and mutation (\coderef{Board.java}{merge}), and replacing the parent generation with the child generation. 

Testing this implementation according to the testing procedures in Section \ref{sec:testing} resulted in an average time to completion of $915 \pm 10.3$ seconds, and a success rate of 10\%.

This very low success rate arose mostly because, after the first few generations, the population was almost completely composed of copies of the same chromosome. When these copies reproduced, they almost always produced more copies of the same chromosome rather than finding better chromosomes.

There are several problems with this algorithm that reduce its performance. One problem is premature convergence, which is a reduction of genetic diversity to the point where the genetic algorithm is unable to continue exploring the search space for fitter solutions. This occurs when the population contains only the descendents or the copies of one chromosome, so there is no variety of genetic material. This problem arises most often when a locally optimum chromosome is created, and it and its descendents dominate the population and crowd out all other chromosomes. The population has thus converged on a certain chromosome before the genetic algorithm is able to explore much of the search space. In the test above, premature convergence was the primary cause of such a low success rate.

In order to reduce the problem of premature convergence, one technique is incest prevention. Since premature convergence usually occurs when a locally optimum chromosome multiplies out of control by reproducing with near-copies of itself, it can be avoided in many cases by prohibiting such ``incestuous'' reproduction. In order to prevent incest, it is necessary to prevent more than one copy of the same board to exist in a population (\coderef{Population.java}{unique}), as well as increasing the mutation rate for reproduction between two boards that are excessively similar (\coderef{Board.java}{incest}). The former will prevent a board from dominating the population with copies of itself, while the latter will introduce genetic diversity through mutation if the population consists of many similar boards.

Another possible solution for premature convergence is weighted random mating. This technique aims to increase genetic diversity by giving a wider fitness range of chromosomes a chance at contributing genetic material to the next generation. Normally, when evolving the next generation by mating the current generation, the boards are paired up according to score---the highest-scoring boards is paired up with the second-highest, the next one is paired up with the next after that, and so on. This has the advantage of ensuring that the boards with the best characteristics get a chance to merge genetic material and produce an even better board, but because boards that have similar scores often came from the same genetic background, it encourages incest and discourages diversity. This may make it more likely for the population to converge prematurely on a local maximum, resulting in a low success rate. Instead, weighted random mating pairs boards up in a more diverse fashion. For every pair, it chooses two random boards from the population, weighted so that the higher a board's score, the greater chance it has of being chosen. This approach should allow lower-scoring boards to pair with higher ones, increasing genetic diversity, while still preferring higher-scoring boards.

Testing the two techniques separately and comparing the results provides a basis for deciding which, if any, technique is more effective at improving performance. Testing the single-population algorithm with incest prevention resulted in an average time of $501 \pm 9.9$ seconds, and a success rate of 70\%. Weighted random mating resulted in an average time of $867 \pm 134$ seconds, and a success rate of 14\%. Compared to the performance of the simple genetic algorithm, both of these techniques provided improvements, but comparing the success rates of two techniques shows that incest prevention is better for solving the problem of premature convergence, because the greater the success rate, the fewer trials failed early due to premature convergence.
% TODO test wr mating

Another problem with the genetic algorithm is the loss of highly fit chromosomes. If, by chance, an especially fit chromosome is produced in one generation, it is usually difficult for the chromosome's children to surpass its fitness in just one generation. However, instead of being given more chances at creating higher-scoring children, the chromosome dies in one generation and only its less-fit children remain. In order to give such a high-scoring chromosome the extra chances at reproduction that it needs, some way of promoting the chromosome to the next generation unmodified.

One way of doing this is elitist selection. This technique involves copying the highest-scoring board from one generation directly into the next generation, as well as allowing it to reproduce.

Testing an implementation of elitist selection, combined with the best premature convergence technique above, incest prevention, resulted in an average time of $368 \pm 9.8$ seconds, and a success rate of 78\%. Elitist selection improved the average time by a factor of 1.4, demonstrating that it is an effective solution for the loss of highly fit chromosomes.	

Using the best of these techniques, incest prevention and elitist selection, we can proceed from optimization of the genetic algorithm on a single processor to optimization on multiple parallel processors.

\subsubsection{Coarse-grained distributed genetic algorithm}

After creating and optimizing a single-population genetic algorithm, the next step is to take advantage of parallelism by modifying the algorithm to work on multiple computers, called distributed computing. The advantage of using multiple processors instead of one is that evolution can be performed in parallel and therefore takes less time. In addition, it is cheaper to use many commodity computers than just one computer with hundreds of processors \cite{google}. One common way of running the genetic algorithm on multiple computers is called a coarse-grained distributed genetic algorithm (DGA). In this technique, each computer simulates an independent population, called a subpopulation, and chromosomes from each subpopulation migrate periodically to other subpopulations. (The opposite method is the fine-grained DGA, in which each computer is assigned small tasks like scoring a particular board by the master computer. There is only one population in total in this technique, as opposed to one per computer in the coarse-grained DGA.)

The coarse-grained DGA has several advantages over other distributed genetic algorithm variants. It is useful for clusters where network latency is high or bandwidth is low, because it requires relatively little communication between computers (only the overall command signals and the migrant chromosomes need to travel over the network). This means every computer can be working all the time, instead of spending a significant portion of time waiting for another computer to communicate with it over the network \cite{coarse}. In addition, using multiple subpopulations instead of one large population allows the subpopulations to evolve along different evolutionary paths \cite{unix}. This increases diversity by making it less likely that the entire genetic algorithm will be dominated by one gene.

The coarse-grained DGA is implemented using two components---a server and a client. The server is implemented in \class{Server} (\ref{Server.java}). It handles migration, keeps statistics about the subpopulations, and performs other administrative tasks for all the subpopulations. The client is implemented in \class{GeneticClient} in the glossary. It evolves an individual subpopulation, communicating with the server to take commands and send and receive migrant chromosomes.

In the unmodified coarse-grained DGA, migration is done every time a subpopulation completes a generation. Each time this happens, the subpopulation submits its most fit chromosome to the server for migration, keeping a copy for itself as well. It also accepts an incoming migrant chromosome from one of the other subpopulations at random.

As with the single-population algorithm, the main problem with this distributed genetic algorithm is premature convergence, not within a subpopulation but rather between subpopulations. This problem causes most or all subpopulations to be dominated by the same genetic materal, losing the advantage of a coarse-grained genetic algorithm---increased diversity because of multiple subpopulations. The most common methods of solving this problem involve modifying the way in which chromosomes migrate between subpopulations.

One solution to this problem is a fitness-based migration hierarchy. Unlike the standard, random method of migration, a fitness-based hierarchy only allows migrant chromosomes to travel to subpopulations with a higher average fitness than their home subpopulations.

This implementation was tested according to the testing procedures in Section \ref{sec:testing}. The implementation used the techniques for single-population genetic algorithms found in the previous section, as well as the fitness-based migration hierarchy. Testing resulted in an average time to completion of $258 \pm 9.9$ seconds, and a success rate of 94\%. Therefore, moving to a distributed algorithm and taking advantage of both cores in the same computer, as well as reducing premature convergence, improved the performance of the algorithm by a factor of 1.4.

\section{Conclusion}

By investigating the performance, both in terms of time and success rate, of three global optimization algorithms---hill climbing, genetic algorithms, and coarse-grained DGAs---this paper determined that non-greedy algorithms are generally more successful than greedy ones, and that reimplementing an algorithm to make it distributed provides a significant performance benefit. The specific algorithms and techniques that were found to be the most effective were coarse-grained DGAs with incest prevention, elitist selection, and a fitness-based migration hierarchy. Using this algorithm, a very high-scoring Boggle board was found, given in Figure \vref{fig:best-board} (one of the longest words on this board is ``predestines,'' worth 11 points in Boggle).

\begin{figure}
	\begin{center}
		\includegraphics[width=2in]{best-boggle}
	\end{center}
	\caption{Best Boggle board found. Score: 4410}
	\label{fig:best-board}
\end{figure}

However, these findings are somewhat specific to the problem of optimizing Boggle boards. The testing used to find the highest-performing algorithms attempted to maximize the objective function of the score of a Boggle board, but this objective function is peculiar because evaluating it is a relatively time-consuming operation. Broader testing, using a variety of types of objective functions, is needed to establish the superiority of these algorithms.

In addition, only three types of global optimization algorithms were tested out of the hundreds that exist. While these three are among the most widely used, investigation of many other algorithms is necessary to conclude that any particular global optimization algorithm is superior.

% An effective conclusion is clearly stated; it is relevant to the research question and consistent with the evidence presented in the essay. It should include unresolved questions where appropriate to the subject concerned.

\FloatBarrier

\setstretch{1.0}
\addcontentsline{toc}{section}{References}
\bibliographystyle{unsrt}
\nocite{*}
\bibliography{bibliography}

\setstretch{0.8}
\appendix
\section{Appendix}

\newcommand{\apdxcodelisting}[1]{\subsection{#1}\label{#1}%
\lstinputlisting[language=Java,basicstyle=\ttfamily\footnotesize]{code/#1}
}
\apdxcodelisting{Board.java}
\apdxcodelisting{BoardTester.java}
\apdxcodelisting{ByBoardScore.java}
\apdxcodelisting{ByStringLength.java}
\apdxcodelisting{Dictionary.java}
\apdxcodelisting{DictionaryTester.java}
\apdxcodelisting{GenerationEmptyException.java}
\apdxcodelisting{GeneticClient.java}
\apdxcodelisting{GeneticClientTester.java}
\apdxcodelisting{GeneticClientThread.java}
\apdxcodelisting{HillClimber.java}
\apdxcodelisting{Population.java}
\apdxcodelisting{Server.java}
\apdxcodelisting{ServerTester.java}
\apdxcodelisting{ServerThread.java}
\apdxcodelisting{Util.java}

\end{document}