tag:blogger.com,1999:blog-4696052836745948302Mon, 15 Sep 2014 14:20:29 +0000programminggrad-schooloptimizationpythonmachine learningpoetryMATLABconferencehaikunsfpublicationstexbashc/c++designphilanthopyquotesunsupervisedLife By NumbersEngineering, Tech, TeX, and Numbers. Lots of Numbers.http://blog.cmccomb.com/noreply@blogger.com (Chris McComb)Blogger23125tag:blogger.com,1999:blog-4696052836745948302.post-8661690642003073515Wed, 13 Aug 2014 17:41:00 +00002014-08-13T10:46:05.752-07:00pythonASCII Status Bar for PythonI've been doing a lot of work in Python lately, and I became tired of printing out line upon line of integers just to measure progress. So, I wrote a class that prints and updates an ASCII status bar. The best part is that you can implement it with only two lines of code! Here it is: <br /><br /><pre class="brush: python" id="code">import sys<br /><br />class StatusBar(object):<br /> def __init__(self, step_total, width):<br /> self.char_width = width - 2 <br /> self.current_step = 1<br /> self.step_total = step_total<br /> # Make string<br /> self.this_string = [" "] * self.char_width<br /> self.this_string = ["["] + self.this_string + ["]"]<br /> self.this_string[1] = ">"<br /> <br /> def inc(self):<br /> if self.current_step <= self.step_total:<br /> self.current_step += 1<br /> self._output()<br /> <br /> def _output(self):<br /> P = float(self.current_step - 1)/float(self.step_total)<br /> n_char = P*float(self.char_width)<br /> n_char = int(n_char)<br /> for i in range(n_char):<br /> self.this_string[i+1] = "="<br /> if n_char != self.char_width:<br /> self.this_string[n_char+1] = ">"<br /> print("\r"+"".join(self.this_string)+" {0:.1f}%".format(100.0*P)),<br /> sys.stdout.flush()<br /> if P == 1.0:<br /> print("\n")</pre><br />The implementation is simple. One line goes outside of the loop to initialize the status-bar, and the other goes inside to update it (calling the <tt>inc</tt> method). Like so: <br /><br /><pre class="brush: python" id="code">from time import sleep<br /><br />iterations = 23<br />bar_width = 40<br />bar = StatusBar(iterations, bar_width)<br /><br />for _ in range(iterations):<br /> sleep(0.1)<br /> bar.inc()<br /></pre><br /><script type="text/javascript"> var element = document.getElementById('code'); SyntaxHighlighter.highlight(undefined, element); </script> The output will look something like this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-p1oMGKLeiEw/U-ujGUxEWXI/AAAAAAAAZ28/5iEXlG8O61I/s1600/asdf.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-p1oMGKLeiEw/U-ujGUxEWXI/AAAAAAAAZ28/5iEXlG8O61I/s1600/asdf.gif" /></a></div><br /><br />http://blog.cmccomb.com/2014/08/ascii-status-bar-for-python.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-1774870362297576534Sat, 09 Aug 2014 02:29:00 +00002014-08-08T19:29:36.838-07:00programmingpythonInstalling SciPy on the Samsung ChromebookSo, you want to run SciPy on your Chromebook? These are the steps I followed to do my installation.<br /><br />First, we need to download and install Crouton. Follow the instruction <a href="https://github.com/dnschneid/crouton">here</a>, but all we need is the CLI target (<code>sudo sh -e ~/Downloads/crouton -t cli-extra</code>) rather than the full desktop target (<code>sudo sh -e ~/Downloads/crouton -t xfce</code>)<br /><br />Time to start your new Ubuntu shell! Open a tab with <code>CTRL+ALT+T</code>, and then execute:<br /><code>sudo startcli</code><br /><br />Now we get into the meat of it. Although you have a functional shell, it's missing some important components. First, let's round of the Python capabilities:<br /><code>sudo apt-get install python-pip python-dev</code><br /><br />Now, we need to round out some of the general compiler capabilities:<br /><code>sudo apt-get install gcc g++ libatlas-base-dev gfortran</code><br /><br />Finally, we can install NumPy and SciPy! This should go as usual.<br /><code>pip install numpy</code><br /><code>pip install scipy</code><br /><br />And that's all there is to it! Good luck!http://blog.cmccomb.com/2014/08/installing-scipy-on-samsung-chromebook.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-8348306542307461986Wed, 09 Jul 2014 15:26:00 +00002014-07-16T08:12:12.372-07:00programmingpythonTruss Analysis in PythonFor some recent research, I was doing analysis of truss designs using a <a href="http://www.mathworks.com/matlabcentral/fileexchange/14313-truss-analysis">MATLAB script written by Hossein Rahami</a>. Now that I'm transitioning from MATLAB to Python, I decided to rewrite the analysis function in Python. It's not the most glamorous script that I've ever written, but its functional! <br /><br />The functions <tt>force_eval</tt> and <tt>fos_eval</tt> require Numpy to work. The plotting function plot_truss requries the matplotlib package. In addition, there's a function called init_truss that initializes a random truss for a given set of support points. The output of the init_truss function yields trusses that aren't entirely unreasonable:<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-NWUDBaTFttY/U8aVy2xE67I/AAAAAAAAZho/WfUhFF_8iCU/s1600/fos.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://3.bp.blogspot.com/-NWUDBaTFttY/U8aVy2xE67I/AAAAAAAAZho/WfUhFF_8iCU/s1600/fos.png" height="300" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Truss plotted using the xkcd() option in matplotlib.</td></tr></tbody></table>The red members have a low factor of safety - green members have a higher factor of safety. Dashed members are in tension, and solid members are in compression.<br /><br />You can download all of the functions referred to above <a href="http://www.cmccomb.com/blog/truss/truss.zip">here</a>.<br /><br />http://blog.cmccomb.com/2014/07/truss-analysis-in-python.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-911649083849875004Fri, 30 May 2014 13:33:00 +00002014-05-30T06:33:21.509-07:00optimizationprogrammingpythonGlobal Optimization Benchmark Functions<h3>Benchmarking Functions</h3>Most STEM disciplines have at least a dozen ways to accomplish a given task - global optimization is no exception. For that reason, benchmarking functions are particularly useful, since they provide a means for empirically comparing different methods.<br /><br />The following are a few benchmark functions that I've found particularly useful. I took inspiration from <a href="http://en.wikipedia.org/wiki/Test_functions_for_optimization">this wikipedia article</a>, but modified many of the functions to assure 3 characteristics:<br /><ol><li>Extensibility to an arbitrary number of dimensions, <i>d</i></li><li>A minimizer of <b>0</b> (the zero vector).</li><li>A minimum of 0.</li></ol><div>These aren't necessary characteristics, but they're desirable for ease of analysis. </div><h3>My Function</h3><div>This is a function that I created. It displays multiple local minima, which makes it quite difficult to solve. </div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-VtdsHbrzmJc/U4iGzfim0uI/AAAAAAAAYss/hTg-1qSg3PI/s1600/Screenshot+2014-05-30+09.25.05.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-VtdsHbrzmJc/U4iGzfim0uI/AAAAAAAAYss/hTg-1qSg3PI/s1600/Screenshot+2014-05-30+09.25.05.png" /></a> </div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-wXmPKOZXC58/U4iFdTYVG3I/AAAAAAAAYsQ/rf3vtH3jJc4/s1600/mine.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-wXmPKOZXC58/U4iFdTYVG3I/AAAAAAAAYsQ/rf3vtH3jJc4/s1600/mine.png" height="315" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div><br /></div><h3>The Ackley Function</h3><div>The Ackley function is a well-known benchmarking function. Not only does it display <i>many</i> local minima, but it's global minimum is significantly lower than much of the function.</div><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-CHN9Z-rUOuM/U4iGzQvyjiI/AAAAAAAAYsw/xPLDh3ITzG4/s1600/Screenshot+2014-05-30+09.25.11.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://1.bp.blogspot.com/-CHN9Z-rUOuM/U4iGzQvyjiI/AAAAAAAAYsw/xPLDh3ITzG4/s1600/Screenshot+2014-05-30+09.25.11.png" /></a></div><div></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-n8UfSajUTB8/U4iFcv80xDI/AAAAAAAAYsM/w2xcb7FAwss/s1600/ackley.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://2.bp.blogspot.com/-n8UfSajUTB8/U4iFcv80xDI/AAAAAAAAYsM/w2xcb7FAwss/s1600/ackley.png" height="315" width="320" /></a></div><div></div><div><br /></div><h3>The Easom Function</h3><div>No, there wasn't an error when I plotted the function - it really is mostly flat. The global minimum is an oasis in what is otherwise a desert of gradient information.</div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-rWoXTuHGCfo/U4iIEW9C9PI/AAAAAAAAYtQ/AOLMd5Vq_mY/s1600/Screenshot+2014-05-30+09.30.55.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-rWoXTuHGCfo/U4iIEW9C9PI/AAAAAAAAYtQ/AOLMd5Vq_mY/s1600/Screenshot+2014-05-30+09.30.55.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-gGd7vRwg75Q/U4iFcjE23pI/AAAAAAAAYsI/5fqmXs0MnCA/s1600/easom.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://2.bp.blogspot.com/-gGd7vRwg75Q/U4iFcjE23pI/AAAAAAAAYsI/5fqmXs0MnCA/s1600/easom.png" height="315" width="320" /></a></div><div></div><div><br /></div><h3>The Rosenbrock Function</h3><div>The Rosenbrock function doesn't look very challenging at first. What makes it difficult is the huge variation in scale, and the very flat valley in which the global minimum exists.</div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-m8Cn5F1f1cY/U4iG0BhfuoI/AAAAAAAAYs0/ild83qWn4oY/s1600/Screenshot+2014-05-30+09.25.22.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://2.bp.blogspot.com/-m8Cn5F1f1cY/U4iG0BhfuoI/AAAAAAAAYs0/ild83qWn4oY/s1600/Screenshot+2014-05-30+09.25.22.png" /></a></div><div></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-tTXzsPzuUV4/U4iFdfUKOHI/AAAAAAAAYsc/1VY0riRGEPg/s1600/rosenbrock.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://2.bp.blogspot.com/-tTXzsPzuUV4/U4iFdfUKOHI/AAAAAAAAYsc/1VY0riRGEPg/s1600/rosenbrock.png" height="318" width="320" /></a></div><div></div><div><br /></div><div><br /></div><div>The code for creating these functions and plots was written as an <a href="http://ipython.org/">IPython Notebook</a>. You can view the notebook <a href="http://nbviewer.ipython.org/url/www.cmccomb.com/blog/benchmarking/bm_func.ipynb">here</a>, and download it <a href="http://www.cmccomb.com/blog/benchmarking/bm_func.ipynb">here</a>.</div>http://blog.cmccomb.com/2014/05/global-optimization-benchmark-functions.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-906648747321320834Fri, 25 Apr 2014 19:15:00 +00002014-07-07T05:18:33.814-07:00conferencepublicationsResearch and Practice Group Methodology: A Case Study in Student SuccessToday I had the honor of co-presenting with Fariborz Tehrani at the 2014 ASEE PSW Section Conference. Tehrani was my undergraduate advisor at CSU Fresno. Our paper examined the methodology that he employs in his research group, and analyzed it using several qualitative and quantitative measures metrics.<br /><br />You can access the paper <a href="http://www.cmccomb.com/pubs/2014asee/2014ASEE_Final_Paper.pdf">here</a>, and the presentation <a href="http://www.cmccomb.com/pubs/2014asee/2014ASEE_Final_Presentation.pdf">here</a>.http://blog.cmccomb.com/2014/04/research-and-practice-group-methodology.htmlnoreply@blogger.com (Chris McComb)1tag:blogger.com,1999:blog-4696052836745948302.post-6312115746747497799Sat, 12 Apr 2014 14:51:00 +00002014-04-17T09:59:47.467-07:00optimizationprogrammingpythonSimply Simulated AnnealingSimulated annealing is a stochastic optimization technique that was inspired by the process of annealing metal to remove residual stresses. The general idea is that the algorithm jumps around within the space, but the jumps become smaller and more deterministic as the algorithm progresses. This transition from random exploration to deterministic search makes simulated annealing an option for global optimization.<br /><br />The following is a simple python script for implementing a simulated annealing algorithm to <b><i>minimize</i></b> a function:<br /><br /><pre class="brush: python" id="code">from numpy import diag, cos, sum, array, arange, meshgrid, zeros<br />from numpy.random import rand, multivariate_normal, seed<br />from numpy.linalg import norm<br /><br />PROB_BAD = 0.9<br />UPDATE = 0.9<br />ITER = 100<br />VARIANCE = array([10.0, 10.0])<br />START_POINT = array([5.0, 5.0])<br /><br />seed(345)<br /><br />def objfun(x):<br /> """This computes the objective function"""<br /> return sum(-cos(x)) + 0.5*norm(x)<br /><br />class SA(object):<br /> """This is an agent"""<br /> def __init__(self):<br /> self.current = START_POINT<br /> self.value = objfun(self.current)<br /> self.prob_bad = PROB_BAD<br /> self.variance = VARIANCE<br /> self.update = UPDATE<br /><br /> def _update(self):<br /> """This decreases the stochasticity of the algorithm"""<br /> self.variance *= self.update<br /> self.prob_bad *= self.update<br /><br /> def _get_point(self):<br /> """This gets a new candidate point from the neighborhood"""<br /> return multivariate_normal(self.current, diag(self.variance))<br /><br /> def jump(self):<br /> """This function jumps to a new solution point"""<br /> possibility = self._get_point()<br /> if objfun(possibility) < self.value:<br /> self.current = possibility<br /> self.value = objfun(self.current)<br /> elif self.prob_bad > rand():<br /> self.current = possibility<br /> self.value = objfun(self.current)<br /> self._update()<br /><br />if __name__ == "__main__":<br /> # Make contour plot of function<br /> alg = SA()<br /> for i in range(ITER):<br /> alg.jump()<br /> print(str(i)+"\t"+str(alg.value)+"\t"+str(alg.current))<br /><br /><br /></pre><script type="text/javascript"> var element = document.getElementById('code'); SyntaxHighlighter.highlight(undefined, element); </script> Lets discuss the components of this algorithm in order:<br /><ol><li><b>Constants:</b></li><ul><li><tt>PROB_BAD</tt>: the probability that the algorithm will move to a new point, even if the new point has a higher objective function value.</li><li><tt>ITER</tt>: The number of iterations to complete.</li><li><tt>VARIANCE</tt>: The initial variance of the jumps that the algorithm makes (the new jump location is chosen as a multivariate normal distribution).</li><li><tt>START_POINT</tt>: The starting point for the algorithm.</li><li><tt>UPDATE</tt>: The update factor. This is the factor by which <tt>VARIANCE</tt> and <tt>PROB_BAD</tt> are decreased at each iteration.</li></ul><li><b>Seed</b>: The seed simply ensures that the random number generation starts at the same point each time the algorithm is run. This ensures consistent results for the sake of demonstration, and can be removed.</li><li><b>Objective Function: </b>A fairly arbitrary function (see contour plot below). It has a number of local minima, which makes it a tricky problem for deterministic solvers.</li><li><b>SA class</b>: This is an instance of the simulated annealing algorithm. The methods are as follows:</li><ul><li><tt>_update</tt> method: This updates <tt>PROB_BAD</tt> and <tt>VARIANCE</tt> using the <tt>UPDATE</tt> constant.</li><li><tt>_get_point</tt> method: This method returns a new potential point for a jump by drawing a point from the normal distribution, defined with a mean of the current point, and a variance of <tt>VARIANCE</tt>.</li><li><tt>jump</tt> method: This method first uses <tt>_get_point</tt> to get a candidate for the jump. If the new point has a lower objective function value than the current point, the algorithm jumps. If the new point has a higher objective function value than the current point, the algorithm will still jump with probability <span style="font-family: monospace;">PROB_BAD</span>.</li></ul><li><b>Execution</b>: Now that the SA class has been set up, the algorithm can be implemented easily. Simply create an instance of the class, and iterate through a number of jumps.</li></ol>Executing the above script will find the global minimum of the function. The specific set of points traversed by the algorithm is shown in the image below. Note that changing or omitting the seed will modify the results.<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-ileK4tXxRhw/U0lU9FrqFgI/AAAAAAAAVFM/SA3key4b8e0/s1600/results.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-ileK4tXxRhw/U0lU9FrqFgI/AAAAAAAAVFM/SA3key4b8e0/s1600/results.gif" height="480" width="640" /></a></div>The scrip that runs the algorithm and produces the gif of the results can be downloaded <a href="http://www.cmccomb.com/blog/sa/sa.zip">here</a>.http://blog.cmccomb.com/2014/04/simply-simulated-annealing.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-4673827759076088308Thu, 10 Apr 2014 12:24:00 +00002014-04-12T18:04:32.531-07:00MATLABoptimizationprogrammingBasically Bi-Cubic BezierBezier surfaces are incredibly versatile, incredibly intuitive and incredibly useful. A bi-cubic Bezier surface is defined by 16 control points. A point anywhere on the surface can then be calculated as a weighted sum of the control points. The general idea is that moving a control point "stretches" the curve locally. Since the focus of this post isn't Bezier surfaces themselves, I'll skip the specifics. If you're interested, you can browse <a href="http://en.wikipedia.org/wiki/B%C3%A9zier_surface">the wikipedia page</a> or check out <a href="http://geometrie.foretnik.net/files/NURBS-en.swf">this Bezier curve demo</a>.<br /><br />Because Bezier surfaces are so common, they're often used to approximate arbitrary surfaces. For instance, say we want to fit a bi-cubic Bezier surface to a portion of a sphere. One way to accomplish this would be by playing with the points to get the appropriate gestalt. However, a better method is to frame the problem in an optimization context (where the goal is to minimize the mean squared error of the fit), and then pass it to an appropriate solver. <br /><br />For instance, the result of fitting a bi-cubic Bezier surface to a portion of a sphere, using MATLAB's <tt>fmincon</tt> function is shown below. The Bezier surface is colored according to squared error. Black points are the control points for the surface.<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-lZ92Drn0cow/U0aJHUckdEI/AAAAAAAAVDc/qonRQn103pI/s1600/absolute.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-lZ92Drn0cow/U0aJHUckdEI/AAAAAAAAVDc/qonRQn103pI/s1600/absolute.gif" height="478" width="640" /></a></div>Even though the initial guesses for the control points were pretty far off, the end result was a <i>very</i> close fit (average mean squared error = 3.8635·10<sup>-04</sup>). <br /><br />The code for the performing the fit was written in MATLAB, and you can download it <a href="http://www.cmccomb.com/blog/bezier-fit/bezier-fit.zip" title="Horribly uncommented, ugly code. My apologies.">here</a>.http://blog.cmccomb.com/2014/04/bezier-fit.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-1857336605940272712Tue, 08 Apr 2014 00:13:00 +00002014-04-07T17:14:13.383-07:00grad-schoolmachine learningMachine Learning Review SheetThis semester I'm taking 10-701: Introduction to Machine Learning. The course is taught by <a href="http://www.cs.cmu.edu/~aarti/">Aarti Singh</a> and <a href="http://www.cs.cmu.edu/~bapoczos/">Barnabás Póczos</a>.<br /><br />For the exams, we are allowed a single equation sheet. You can download my equation sheet <a href="http://www.cmccomb.com/blog/introml/equation_sheet.pdf">here</a>. It covers basic information on a range of topics, from MLE/MAP to Graphical Models to Boosting.<br /><br />Good luck, and use at your own risk!http://blog.cmccomb.com/2014/04/machine-learning-review-sheet.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-2252589414863272381Sat, 05 Apr 2014 15:35:00 +00002014-04-05T08:35:25.863-07:00grad-schooloptimizationNLP Optimization Review SheetLast semester I took Jeremy Michalek's course on Engineering Optimization (<a href="http://www.cmu.edu/me/graduate/courses.html">course number 24-785</a>). The course covered NLP optimization theory and practice, and <a href="http://www.amazon.com/Principles-Optimal-Design-Modeling-Computation/dp/0521627273">Principles of Optimal Design</a> was the recommended course textbook.<br /><br />For the exams, we were allowed a single equation sheet. You can download my equation sheet <a href="http://www.cmccomb.com/blog/engopt/equation_sheet.pdf">here</a>. It covers most of the equations for unconstrained and constrained optimization (both reduced and Lagrangian form). It also has some miscellaneous material related to numerical tricks for solving optimization problems.<br /><br />Good luck, and use at your own risk!http://blog.cmccomb.com/2014/04/nlp-optimization-review-sheet.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-3052281824785626612Mon, 24 Mar 2014 18:54:00 +00002014-05-06T05:46:29.340-07:00conferencedesigngrad-schooloptimizationpublicationsIDETC and ASEE PSW Conference PapersThis has been a great month for my research! I've been notified that three conference papers were accepted pending minor revisions. These are:<br /><ol><li><strong>McComb, C., </strong>J. Cagan, and K. Kotovsky. “Quantitative Comparison of High- and Low-Performing Teams in a Design Task Subject to Drastic Changes.” <i>ASME IDETC - Design Theory and Methodology Conference</i>. Buffalo, NY, USA. August 17-20, 2014.</li><li>Santaeufemia, P.S., N.G. Johnson, <b>C. McComb</b>, and K. Shimada, “Improving Irrigation in Remove Areas: Multi-objective Optimization of a Treadle Pump.”<i> ASME IDETC - Design Automation Conference.</i> Buffalo, NY, USA. August 17-20, 2014.</li><li><b>McComb </b><b>C.</b> and F.M. Tehrani, "Research and Practice Group Methodology: A Case Study in Student Success.", <i>American Society for Engineering Education Zone IV Conference</i>. Long Beach, CA, USA. April 24-26, 2014.</li></ol><div>I'll even have an <a href="http://en.wikipedia.org/wiki/Erd%C5%91s_number">Erdős number</a> of 5 once my paper with Kotovsky is official!</div>http://blog.cmccomb.com/2014/03/idetc-and-asee-psw-conference-papers.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-6131673546748051067Wed, 19 Feb 2014 00:50:00 +00002014-02-18T16:50:02.965-08:00philanthopyUNICEF Tap ProjectHow long do you think you can go without touching your phone? 5 minutes? 15 minutes? Maybe an hour? Here's your chance to test your willpower <i>and</i> make a positive difference in the world!<br /><br />The <a class="g-profile" href="https://plus.google.com/100379471031450585233" target="_blank">+UNICEF</a> Tap Project is deceptively simple: just open <a href="http://uniceftapproject.org/">this webpage</a> in your mobile browser, and don't touch or move your phone. For every minute that you're successful, UNICEF's sponsor will provide one day of clean water to a child in need.<br /><br />Good luck!http://blog.cmccomb.com/2014/02/unicef-tap-project.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-6434687956753959230Wed, 12 Feb 2014 01:53:00 +00002014-04-12T09:45:35.644-07:00optimizationpythonGenerally Genetic Algorithms<h2>What is a GA? </h2>A <a href="http://en.wikipedia.org/wiki/Genetic_algorithm">Genetic Algorithm</a> (or GA) is a heuristic optimization algorithm that attempts to copy the process of natural selection. At any given moment, the algorithm maintains a number of potential solutions. These solutions are allowed to thrive, breed, and die out - all with some level of randomness. <br /><h2>When should I use a GA </h2>GAs are rarely the best way to tackle an optimization problem. However, they are often the <i>second</i> best. Deterministic methods are quick, and will indicate whether or not a local optimum has been found (i.e. the gradient of the function will be 0). Unfortunately, different deterministic algorithms are usually suited for specific problems, and may be nearly useless for others. For instance, the <a href="http://en.wikipedia.org/wiki/Simplex_algorithm">Simplex Algorithm</a> is useless for non-linear optimization. However, a GA can be used to solve nearly any problem with very little modification. The primary disadvantage is that a GA offers no guarantee on optimality (because gradients are not calculated). <br /><h2>Terminology and Workflow </h2>Before we go any further, we should define some terms to make sure we're all on the same page. Yes, I know this is boring, but bear with me: <br /><ol><li><b>Individual:</b> A single possible solution. If the GA is being used to optimize a univariate function, the individuals are different values of the independent variable. For multivariate problems, the individuals each represent a vector of the independent variables. Typically, the variables are expressed with a <a href="http://www.obitko.com/tutorials/genetic-algorithms/encoding.php">binary string</a>. </li><li><b>Fitness Function</b> The function that is used to define the "goodness" of an individual. It follows that the goodness of an individual is referred to as <b>fitness</b>. </li><li><b>Population:</b> A set of individuals. </li><li><b>Selection:</b> The act of choosing individuals from the population, with a stochastic bias towards individuals with higher fitness. </li><li><b>Generation:</b> The act of using individuals chosen through <b>selection</b> to produce new individuals. These individuals resulting from <b>generation</b> will form a new population. </li></ol>The algorithm itself is relatively simple. Initially, a random population of individuals is produced. Next, some of these individuals are selected. Then, using the selected individuals, new individuals are created and placed in a new population. The fitness of each individuals in the new population is calculated, and the process repeats. <br /><h2>One-Max Solution </h2>Now, let's get into some nitty gritty details. First, let's define the problem we're trying to solve. For this example, it will be the <a href="http://tracer.lcc.uma.es/problems/onemax/onemax.html">one-max problem</a>. The objective is to maximize the number of ones in a bitstring. The fitness function can be calculated as: <br /><pre class="brush: python" id="code1">def objfun(string):<br /> return sum(string)<br /></pre><script type="text/javascript"> var element = document.getElementById('code1'); SyntaxHighlighter.highlight(undefined, element); </script>This is just the sum of the digits in the string. Now lets introduce a class for individuals: <br /><pre class="brush: python" id="code2">class Individual:<br /> """Common base case for all individuals"""<br /><br /> def __init__(self, useThisString=None):<br /> """Randomly initializes a string"""<br /> self.strSize = 30<br /> if useThisString is None:<br /> self.string = [choice((0, 1)) for b in range(self.strSize)]<br /> else:<br /> self.string = useThisString<br /> self.fitval = objfun(self.string)<br /><br /> def fitness(self):<br /> """Returns fitness of design"""<br /> self.fitval = objfun(self.string)<br /> return self.fitval<br /><br /> def giveMe(self):<br /> """Returns full string of design"""<br /> return self.string<br /><br /> def mutate(self):<br /> """Mutates design by flipping one bit"""<br /> if random() < 0.5:<br /> N = randint(1, self.strSize) - 1<br /> self.string[N] = choice((0, 1))<br /> self.fitval = objfun(self.string)<br /></pre><script type="text/javascript"> var element = document.getElementById('code2'); SyntaxHighlighter.highlight(undefined, element); </script>This class has several different methods. When initialized, an individual is a random string composed of 30 bits. The <tt>fitness</tt> method calculates the fitness of the individual; <tt>giveMe</tt> returns the bitstring; and <tt>mutate</tt> performs a random mutation of a bit half of the time. Now that we've defined the individual, we need to define the population. The population class will give us a location to store and operate on individuals: <br /><pre class="brush: python" id="code3">class Population:<br /> """A place to hold and operate on individuals"""<br /><br /> def __init__(self, numIndiv):<br /> """Initializes the blackboard"""<br /> self.oldPop = []<br /> for i in range(numIndiv):<br /> self.oldPop.append(Individual())<br /> self.newPop = []<br /><br /> def postIndiv(self, thisIndiv):<br /> """Post an individual to the blackboard"""<br /> self.newPop.append(thisIndiv)<br /><br /> def getIndiv(self, index=None):<br /> """Get an individual from the blackboard"""<br /> if index is None:<br /> return choice(self.oldPop)<br /> else:<br /> return self.oldPop[index]<br /><br /> def reset(self):<br /> """Clear newPop and moves individuals to oldPop"""<br /> self.oldPop = []<br /> self.oldPop = self.newPop<br /> self.newPop = []<br /><br /> def orderPop(self):<br /> """Sort the individuals in terms of fitness"""<br /> self.oldPop.sort(key=operator.attrgetter('fitval'))<br /></pre><script type="text/javascript"> var element = document.getElementById('code3'); SyntaxHighlighter.highlight(undefined, element); </script>When initialized, a population consists of <tt>oldPop</tt> (a list of <tt>Individual</tt> instances), and <tt>newPop</tt>, and empty list. In addition, we've included a way to get individuals from the old population (<tt>getIndiv</tt>), post individuals to the new population (<tt>postIndiv</tt>), and sort the population in terms of fitness (<tt>orderPop</tt>). So, how do we accomplish the tasks of generation and selection? For generation, we will add two methods to <tt>Population</tt>: <tt>mutation</tt> and <tt>crossover</tt>. Crossover is conceptually similar to cellular reproduction. Two bitstrings are randomly selected, a crossover point is chosen randomly, and the strings are swapped at that point. We also use mutation, which randomly flips a bit in the string. The code for these is shown below: <br /><pre class="brush: python" id="code4"> def crossover(self, NUM):<br /> """Perform crossover to generate NUM individuals"""<br /> for i in range(NUM):<br /> # Get two temporary designs and save to list<br /> temp = []<br /> temp.append(self.getIndiv())<br /> temp.append(self.getIndiv())<br /><br /> # Extract strings<br /> str1 = temp[0].giveMe()<br /> str2 = temp[1].giveMe()<br /><br /> # Select internal crossover point<br /> COP = randint(1, len(str1) - 1)<br /> self.postIndiv(Individual(str1[0:COP] + str2[COP:len(str2)]))<br /><br /> def mutation(self, NUM):<br /> """Generate NUM individuals through mutations"""<br /> for i in range(NUM):<br /> temp = self.getIndiv()<br /> temp.mutate()<br /> self.postIndiv(temp)<br /></pre><script type="text/javascript"> var element = document.getElementById('code4'); SyntaxHighlighter.highlight(undefined, element); </script>For selection, we will add two more methods to <tt>population</tt>. Elitism simply copies the top individuals from one generation to another. Rank selection selects individuals with probability inversely proportional to the individual's rank. <br /><pre class="brush: python" id="code5"> def elitism(self, NUM):<br /> """Perform elite selection for NUM individuals"""<br /> for i in range(NUM):<br /> self.postIndiv(self.getIndiv(-i - 1))<br /><br /> def rankSelect(self, NUM):<br /> """Perform rank selection for NUM individuals"""<br /><br /> # Generate cumulative sum<br /> temp = 1<br /> self.cumsum = []<br /> for i in range(len(self.oldPop)):<br /> temp += i<br /> self.cumsum.append(temp)<br /> self.cumsum = [float(x) / float(self.cumsum[-1]) for x in self.cumsum]<br /><br /> # Select and post designs<br /> for j in range(NUM):<br /> RAND = random()<br /> for i in range(len(self.oldPop)):<br /> if self.cumsum[i] > RAND:<br /> break<br /> self.postIndiv(self.getIndiv(i))<br /></pre><script type="text/javascript"> var element = document.getElementById('code5'); SyntaxHighlighter.highlight(undefined, element); </script>Finally, we pull all of this together with a simple script: <br /><pre class="brush: python" id="code6">from population import *<br />from matplotlib.pyplot import *<br />from numpy import *<br /><br /><br />NUMINDIV = 50<br />ELITENUM = 5<br /><br />BB = Population(NUMINDIV)<br /><br /># Do optimization<br />for i in range(30):<br /> # Perform selection<br /> BB.orderPop()<br /> BB.elitism(ELITENUM)<br /> BB.rankSelect(NUMINDIV - ELITENUM)<br /> BB.reset()<br /><br /> # Perform generation<br /> BB.crossover(int(NUMINDIV / 2))<br /> BB.mutation(int(NUMINDIV / 2))<br /> BB.reset()<br /><br /> # Plot results<br /> temp = BB.report()<br /> plot(i + 1, temp[1], 'ro')<br /> plot(i + 1, temp[2], 'g.')<br /><br />xlabel('Generation')<br />ylabel('Fitness')<br />show()<br />savefig('onemax.png')<br /></pre><script type="text/javascript"> var element = document.getElementById('code6'); SyntaxHighlighter.highlight(undefined, element); </script>This script also includes a few lines for tracking the best individual and mean fitness of the population. The output is a plot of fitness versus generation: <br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-KZ6r1C7o93E/UvzzLAm_7sI/AAAAAAAATq0/nJj85WFUVrM/s1600/onemax.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-KZ6r1C7o93E/UvzzLAm_7sI/AAAAAAAATq0/nJj85WFUVrM/s400/onemax.png" /></a></div>We see that in only a few generations the best string has achieved the maximum fitness possible. You can download the full example <a href="http://www.cmccomb.com/blog/ga/onemax.zip">here</a>.http://blog.cmccomb.com/2014/02/intro-to-genetic-algorithms-part-1.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-7854981267128961162Wed, 12 Feb 2014 00:50:00 +00002014-02-11T16:51:49.253-08:00haikupoetryA Haiku For My Cat<p>Bathwater-dipped tail<br>like a quill; hallway prose in<br>pawprints and droplets.</p>http://blog.cmccomb.com/2014/02/a-haiku-for-my-cat.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-6150897957037322843Sat, 01 Feb 2014 01:31:00 +00002014-02-11T10:59:41.205-08:00machine learningMATLABprogrammingLatent Semantic Analysis<p>Latent Semantic Analysis (LSA) is a document analysis technique that's based on a bag-of-words assumption. What does that mean? LSA is a way to take a set of written documents, and find out how similar they are to one-another.</p> <p>LSA has been used extensively in the design theory and methodology literature for <a href="http://sydney.edu.au/engineering/it/~info4990/2006/papers/INFO4990-dong.pdf">comparing written descriptions of design solutions</a>. At its most basic, Latent Semantic Analysis (LSA) is a computational tool that compares documents based on the relative frequency of occurrence of words. More specifically, a word-by-document matrix is first created. Columns represent documents, and rows represent unique words occurring in the total corpus of text. A weighting is then applied to the matrix to condition the data. Log-entropy weighting was used because it has been shown to perform well across many data sets. Finally, documents are compared by calculating the cosine similarity between columns. The cosine similarity metric ranges between 1 (indicating that the two documents are identical) to -1 (indicating that the two documents are extremely dissimilar). The final result is a document-by-document matrix that gives you a succint comparison of the documents in your set.</p> <p> For instance, say we wanted to compare <a href="http://www.heise.de/ix/raven/Literature/Lore/TheRaven.html">The Raven, by Edgar Allen Poe</a>, to <a href="http://rt.com/usa/obama-2014-state-of-union-274/">this news article</a>, and <a href="http://rt.com/usa/intel-hearing-snowden-threats-369/">this news article</a>. Of course, we expect the news articles to be more similar to one another than to the poem. LSA allows us to quantify their similarity. Performing LSA on these bodies of text reveals that the news articles have a similarity to one another of 0.6181, and a similiarity to the poem of 0.4089 and 0.3611, respectively.</p> <p>I've written a MATLAB script that performs the LSA calculations. Its available <a href="http://www.cmccomb.com/blog/lsa/LSA_example.zip">here</a>, along with the text of the articles and poem.</p>http://blog.cmccomb.com/2014/01/latent-semantic-analysis.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-6835648037840492071Sun, 26 Jan 2014 03:05:00 +00002014-02-11T17:58:58.900-08:00poetryOn Cutting FirewoodThe wood cracks and splits <br />and pops and explodes <br />and whispers to me the heft <br />of a day spent in physical labor, <br />in doing instead of hiding <br />behind sheaves of mint green <br />paper and machine-drawn lines.<br /><br />Layers of detritus fall aside, <br />returning to me the use of senses <br />I had long forgotten I possessed. <br />I believe the weight of the axe in <br />my hands, the risen grain of the <br />wooden handle bearing witness to <br />my rebirth - a son of the earth.<br /><br /> Inhaling, I create a void <br />in my chest for the warm soil <br />and the pungent wood chips. <br />The formed steel rises above <br />my head, my grip widens, <br />tightens. The tendons in my <br />shoulders and chest shorten,<br /><br /> bearing the axe downward <br />faster than gravity alone <br />ever could have carried it. <br />The wood cracks and splits <br />and pops and explodes<br /><br /> and whispers.<br /><br />http://blog.cmccomb.com/2014/01/on-cutting-firewood.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-5548247371215331746Thu, 09 Jan 2014 07:28:00 +00002014-02-07T13:22:45.195-08:00quotesDragons!<blockquote>"Fairy tales are more than true: note because they tell us that dragons exist, but because they tell us that dragons can be beaten."<br>G.K. Chesterton </blockquote><p>Mr. Chesterton was an English poet, orator, and lay theologian. You can read about him <a href="http://en.wikipedia.org/wiki/G._K._Chesterton">here</a>. Good luck beating your dragons in 2014!</p>http://blog.cmccomb.com/2013/01/dragons.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-1968458149478520982Sun, 13 Oct 2013 18:13:00 +00002014-02-11T16:52:11.234-08:00haikupoetryA Haiku For Winter<p> Looking for patterns<br>in snowflakes, one forgets the<br>smell of summer rain.</p>http://blog.cmccomb.com/2013/10/a-haiku-for-winter.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-5544684477023367282Sat, 28 Sep 2013 20:18:00 +00002014-02-07T13:19:37.506-08:00texShareLaTeX: TeX from anywhere!<p>LaTeX is useful, but collaborating on a document can be a pain. Emailing a .tex back and forth takes way too long. Sharing it through Dropbox works slightly better, but is prone to conflicted copies if two individuals try to edit at once. What to do?</p> <p><a href="http://www.sharelatex.com">ShareLaTeX</a> is the answer. It offers a <i>free</i>, full-featured, browser-based LaTeX editor and compiler, and provides features for real-time collaboration and editing. Additional features include a revision history, integration with Dropbox, and key-bindings that make the editor feel like Vim or Emacs.</p> <p>Interested in trying it out? Getting started is as simple as <a href="https://www.sharelatex.com?r=299ef311&rm=d&rs=b">clicking here</a>. Enjoy!</p>http://blog.cmccomb.com/2013/09/sharelatex-tex-from-anywhere.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-3496529717363672323Mon, 23 Sep 2013 18:37:00 +00002014-02-11T17:56:39.980-08:00bashprogrammingGoogle Trends Script <p>Recently I tried out GeekTool (a Mac version of Linux's Conky) and I am quite impressed! Right now I have scripts on my desktop that show the time, weather, unread emails, network stats and laptop stats. And I just threw together a script of my own: <script type="syntaxhighlighter" class="brush: bash"><![CDATA[ #!/bin/sh rm ./spicy.txt url='http://www.google.com/trends/hottrends/atom/hourly' echo " -- SPICY GOOGLE SEARCHES --" curl --silent "$url" | grep "Spicy" | sed -n -e 's/.*">\(.*\)<\/a>.*/\1/p' >> ./spicy.txt while read p; do echo $p sleep $[ ( $RANDOM % 14 ) + 1 ]s echo " (" | tr -d "\n" curl -s --get --data-urlencode "q=wikipedia $p" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | \ egrep -o 'is a.........................|was a........................' | \ head -n1 | tr -d '\n' && echo "...)" done < ./spicy.txt ]]></script> <p>This script pulls the hour's hottest google search terms, and then gets a short description from wikipedia. You should note that running this script very often will result in <tt>curl</tt> not working (Google's servers will start blocking your address). The random sleep helps to prevent this. You can download the script <a href="http://www.cmccomb.com/blog/trends/trends.sh">here</a>. Feel free to use this script for your own Bash adventures. Happy coding!</p>http://blog.cmccomb.com/2013/09/google-trends-script.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-1856202140066831180Fri, 30 Aug 2013 23:17:00 +00002014-02-11T15:57:29.927-08:00grad-schoolnsfMore NSF GRFP!<p>A few weeks ago I posted my <a href="../../2013/08/nsf-grfp.html">successful NSF GRFP essays</a>. My friend Lin Fan has also decided to share his essays! He is currently a graduate student at Stanford University, and received an NSF fellowship at the same time I did. He also applied under Mechanical Engineering. You can access all of his essays and reviews in a single pdf <a href="http://www.cmccomb.com/blog/nsf/LinFanNSF.pdf">here</a>.</p> <p>His essays are markedly different from mine in several ways (format, use of figures, etc.). However, they are still very well put together and communicate his ideas with crystal clarity. If you have questions about his essays, you can contact him at <a href="mailto:linfan@stanford.edu">linfan@stanford.edu</a>. Also, feel free to contact me.</p>http://blog.cmccomb.com/2013/08/more-nsf-grfp.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-7889653008537895153Wed, 21 Aug 2013 01:30:00 +00002014-02-11T17:37:14.852-08:00c/c++grad-schoolprogrammingtexC Programming Presentation<p>This semester I'm TAing for Graduate Numerical Methods. My first task was to develop a presentation to act as a refresher/introduction for C/C++. I decided to throw it together in <a href="http://www.latex-project.org/">LaTeX</a> using <a href="http://en.wikipedia.org/wiki/Beamer_(LaTeX)">Beamer</a>, and I'm pretty pleased with how it came out! Keep in mind, the content is a high-level review of C and C++; I purposely avoided going into too much depth.</p> <ol> <li><a href="http://www.cmccomb.com/blog/cpp/slides.tex">Presentation (.tex)</a></li> <li><a href="http://www.cmccomb.com/blog/cpp/slides.pdf">Presentation (.pdf)</a></li> </ol>http://blog.cmccomb.com/2014/02/c-programming-presentation.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-5209984551977226720Thu, 15 Aug 2013 22:54:00 +00002014-07-11T05:37:36.761-07:00machine learningunsupervisedCluster? I 'ardly know 'er!Cluster analysis is some powerful voodoo, and has been applied to a <i>ton</i> of different areas. Market analysis? <a href="http://www.westga.edu/~bquest/2009/cluster09.pdf">Cluster it.</a> Social network analysis?<a href="http://www.sciencedirect.com/science/article/pii/0022249675900280">Clustering.</a> Medical diagnostics? <a href="http://www.clinchem.org/content/38/2/182.full.pdf">More clusters.</a><br />In a nutshell, cluster analysis is a tool for classification. It involves sorting observations into "natural" groupings. For small data sets in 2 or 3 dimensions, clustering can easily be performed without a computer. However, large data sets with many dimensions necessitate automated means.<br />I've recently had the opportunity to explore cluster analysis as part of my research. Here are a few concepts that I've found particularly awesome.<br /><ol><li><b>Spectral Clustering</b><br /> <a href="http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5b0%5d.pdf">Spectral clustering</a> is easily implemented, quickly solved, and usually outperforms methods like traditional <i>k</i>-means. For datasets like that shown below, spectral clustering is much better than <i>k</i>-means.<br /><img align="middle" src="http://www.cmccomb.com/blog/clust/spectral.png" width="450" /><br />Spectral clustering is really more of a pre-processing step than a clustering algorithm. It consists of computing the eigenvalues of the Laplacian of the similarity matrix, and then reducing the dimensionality of the data according to the resulting eigenvectors. The transformed data is then clustered with a traditional algorithm (usually <i>k</i>-means).</li><li><b>NERF <i>c</i>-means Clustering</b><br /> This clustering method has the coolest name ever. The authors even point it out in <a href="http://www.sciencedirect.com/science/article/pii/0031320394901198">their paper</a>: <blockquote><i>While this acronym does not emphasize the important duality between NERF and OFCM, it is just too tempting to resist!</i></blockquote>NERF stands for Non-Euclidean Relational Fuzzy. <i>Relational</i> means that the algorithm doesn't even need the full data, just a distance matrix. Even cooler, <i>Non-Euclidean</i> means that the distance matrix doesn't even need to correspond to a typical Euclidean space. The distance metric can even be something odd, like the <a href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming Distance</a>; or the <a href="http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance">Damerau-Levenshtein Distance</a>.</li><li><b>How many clusters?</b><br /> How many clusters should the clustering algorithm make? This is not the trivial issue that it might seem. For large, complicated data sets this question can be approached semi-quantitatively by observing the plot of the within-cluster sum squared error (<i>WSSE</i>) as a function of the number of clusters (<i>k</i>). For <i>k = 1</i> the value of <i>WCSS</i> is equal to the<i>SSE</i> of the data set. We then expect the <i>WCSS</i> to decrease rapidly until the appropriate number of clusters is reached. After that, additional clusters should not decrease the <i>WCSS</i> very much. Therefore, the appropriate number of clusters occurs at the "elbow" on the <i>WCSS v. k</i>plot. For instance, the plot might look like this: <br /><img align="middle" src="http://www.cmccomb.com/blog/clust/WSSE.png" width="450" /><br />Using the heuristic described above, this plot indicates that approximately 4 clusters exist. One method that tries to numerically determine this elbow is presented in <a href="http://www.ee.columbia.edu/~dpwe/papers/PhamDN05-kmeans.pdf">this paper</a>.</li><li><b>Rand Index</b><br /> Different clustering algorithms can produce different clustering patterns. For that reason, its sometimes beneficial to compare two partitions. Luckily, the <a href="http://www.tandfonline.com/doi/abs/10.1080/01621459.1971.10482356">Rand Index</a> allows us to do just that. The Rand Index varies between a value of 0 when the partitions are very different, to 1 when they are identical. However, the Rand Index tends to increase with the number of clusters. Hubert et al remedied this shortcoming, proposing the <a href="http://link.springer.com/article/10.1007/BF01908075">Adjusted Rand Index</a>. The adjusted Rand Index also varies between 0 and 1 (in expectation).</li></ol>Cluster analysis is a broad field, and is very much alive. To learn more, check out <a href="http://en.wikipedia.org/wiki/Cluster_analysis">this wikipedia article</a>, <a href="http://www.cs.waikato.ac.nz/ml/">this toolbox</a>, or <a href="http://www-users.cs.umn.edu/~kumar/dmbook/index.php">this textbook</a>.http://blog.cmccomb.com/2013/08/cluster-i-ardly-know-er.htmlnoreply@blogger.com (Chris McComb)0tag:blogger.com,1999:blog-4696052836745948302.post-1985991776225728184Sat, 10 Aug 2013 21:54:00 +00002014-02-11T15:57:44.782-08:00grad-schoolnsfNSF GRFPIn Spring of 2013, I was lucky enough to be awarded an NSF Graduate Research Fellowship. When I started putting together my application, one source of frustration was that I had no idea how to start on the essays. I'm providing my essays and feedback in the hope that they can alleviate some of the uncertainty for future applicants. I've omitted only my proposed research essay, as that research is ongoing.<br /><ol><li><a href="http://www.cmccomb.com/blog/nsf/McComb_RE.pdf">Research experience</a></li><li><a href="http://www.cmccomb.com/blog/nsf/McComb_PS.pdf">Personal statement</a></li><li><a href="http://www.cmccomb.com/blog/nsf/McComb_REV.pdf">Reviews</a></li></ol>I'm happy to offer general advice, and also provide insight as to my choices in these essays. Just shoot me an email. Best of luck!http://blog.cmccomb.com/2013/08/nsf-grfp.htmlnoreply@blogger.com (Chris McComb)0