Running #commentary: April 2014

Over the last few weeks of this course, we have been reviewing searching and sorting algorithms and comparing their runtime complexities in a number of different situations. I had learned about some of these algorithms in CSC108, so I came into this section with the following knowledge:

-Linear search is mostly bad
-Binary search is mostly good
-Insertion sort and selection sort are equally bad in the worst case scenario but one (I couldn't remember which) was better in the best case scenario
-Bubble sort is almost never the answer-- and there was a video with Obama attesting to that on a talk show somewhere

In 108, however, we focused mostly on the algorithms themselves--HOW, but not necessarily HOW FAST they processed code. Most of the problems we worked on revolved around determining which type of sorting algorithm had already been applied to a (short) list.

In 148, we've been working with the algorithms in a more practical way--applying them to very large data sets and comparing their efficiency on the scale on which they'll be used "in the real world", so to speak.

In order to compare their efficiencies, we've been using "big-Oh" notation, which relates the amount of work an algorithm must do to the size of the list it's being applied on, where the "amount of work" generally represents a number of calculations or comparisons the algorithm must execute in order to properly place an item in a sorted list. The the most common "big-Oh" relationships that we've worked with are:

O(1)--constant runtime, where the number of calculations does not vary with the size of the list
O(logn)--logarithmic runtime, where the number of calculations increases with the base-2 log of the
list length
O(n)--linear runtime, where the number of calculations increases linearly with the list length
O(n^(2,3,4 etc))-- exponential runtime, for example, n^2 will quadruple the number of calculations if the list length is doubled
O(nlogn)--the number of calculations increases with the list length multiplied by its base-2 log

These relationships exist in an efficiency hierarchy that roughly runs from the top of the list above to the bottom--but it's not quite that simple. Depending on the state or length of the list, sometimes a simple algorithm will finish faster than a more complex one--one example of this is linear search (O(n)) outperforming binary search on a list where the item sought is the first item in the list. Consquently, it's important to consider the ASYMPTOTIC runtime complexity of a function--different algorithms may dominate above or below a certain threshold list length. (This tricked me on term test two--it's not necessarily ABSOLUTE order of magnitude that determines the best performance of an algorithm.

We've also learned about a new (and awful) sorting algorithm, merge sort, which performs at (nlogn) complexity, compared to the n2 complexity of insertion and selection sort. Along with the other algorithms, we assessed merge sort's performance in lab 10, both by graphing the runtime and using Cprofile to assess the number of function calls made. Cprofile is a useful tool for adapting code because you can see which step of your algorithm is taking up the most time--and hopefully find a way to reduce the number of calls on your problem function.

However, the most efficient way to sort a list remains Python's built-in search function, based on Timsort. It breaks a list down into much smaller pieces and uses the most efficient sorting algorithm for each piece--so, some combination of insertion, selection, bubble, and merge sort--before putting the pieces back together. It has an additional advantage in that it is coded in C, and thus can be applied by the machine more quickly, rather than having to convert from Python before being applied.

I found the ease and practicality of this section to be a HUGE relief after the exercise in frustration that was my experience with linked lists, and it was an INCREDIBLY satisfying feeling to walk out of a lab having completed the exercise. The thing I struggled with most in this section was remembering that log here applies to log base 2-- and not log base 10, as my math-brain repeatedly insisted--a trivial problem compared to the sense of despair that set in when I was confronted with recursive linked list/tree combination questions in my previous labs.

Running #commentary

Wednesday, 2 April 2014

As(sorted) Thoughts

Blog Archive