The core principle behind a merge sort is divide-and-conquer. This is a technique for solving programming problems which relies on discovering a way to divide the problem up into smaller problems, and then solving each of the smaller problems at the same time (see multicore machines for an explanation of why we can do multiple things at once on a computer).
We start with a list. Note that this list is a bit longer than the bubble sort example one because it demonstrates the advantage of merge sort better.
[1, 7, 7, 11, 18, 1, 23, 5, 10, 14, 3, 6, 20, 23, 28, 11, 28, 13, 8, 22]
The key to thinking about merge sort is that we are trying to maximise the number of operations that we can conduct in parallel (i.e. at the same time). We start by dividing up our list into groups of two (if we can't fit everything into neat groups of two – i.e. if the length of our list is not divisible by two –, we do what we always do in computer science and just try to make it fit as best we can – 'imperfect' solutions that work are fine).
- Our list has twenty items (the format I have provided it in is designed to be easy to paste it as Python code, so you can follow along and experiment). First we divide our list into two halves. This means we have:
# let's call this list A
[1, 7, 7, 11, 18, 1, 23, 5, 10, 14]
# let's call this list B
[3, 6, 20, 23, 28, 11, 28, 13, 8, 22]
- We now divide our halves again.
# let's call this list A_A because it is the first half of A
[1, 7, 7, 11, 18]
# let's call this list A_B because it is the second half of A
[1, 23, 5, 10, 14]
# let's call this list B_A
[3, 6, 20, 23, 28]
# let's call this list B_B
[11, 28, 13, 8, 22]
- We do this again.
# let's call this list A_A_A because it is the first half of A_A
[1, 7]
# let's call this list A_A_B because it is the second half of A_A
[7, 11, 18]
# let's call this list A_B_A because it is the first half of A_B
[1, 23]
# let's call this list A_B_B because it is the second half of A_B
[5, 10, 14]
# let's call this list B_A_A because it is the first half of B_A
[3, 6]
# let's call this list B_A_B because it is the second half of B_A
[20, 23, 28]
# let's call this list B_B_A because it is the first half of B_B
[11, 28]
# let's call this list B_B_B because it is the second half of B_B
[13, 8, 22]
- Once we've performed the division at stage 3, we have a lot of lists that are three elements long, and some that are two elements long. What we do here is divide up the three-element lists into a two element list and a one element long list. We leave the two element lists unchanged.
# let's call this list A_A_A because it is the first half of A_A
[1, 7]
# let's call this list A_A_B_A because it is the first half of A_A_B
[7, 11]
# let's call this list A_A_B_B because it is the second half of A_A_B
[18]
# let's call this list A_B_A because it is the first half of A_B
[1, 23]
# let's call this list A_B_B_A because it is the first half of A_B_B
[5, 10]
# let's call this list A_B_B_B because it is the second half of A_A_B
[14]
# let's call this list B_A_A because it is the first half of B_A
[3, 6]
# let's call this list B_A_B_A because it is the first half of B_A_B
[20, 23]
# let's call this list B_A_B_B because it is the second half of B_A_B
[28]
# let's call this list B_B_A because it is the first half of B_B
[11, 28]
# let's call this list B_B_B_A because it is the first half of B_B_B
[13, 8]
# let's call this list B_B_B_B because it is the second half of B_B_B
[22]
- We have now finished our division. The next step is to merge and sort the lists. We want to merge all the
X_X_X_A(whereX_X_Xis any threeAs orBs in an order). Let's start withA_A_B_AandA_A_B_B(so[7, 11]and[18]). To sort these we do what is called "card-player's trick" where we pick the smallest item of each list (thepopfunction in most programming languages). First we pick the first item of[7, 11], which is7. We then compare7to18–7is definitely smaller, so we place7as hte first item of our sorted list (so we now have [7]). We now pick the next smallest item (11) and compare it to18.11is smaller, so we place it next in the sorted list – we now have[7, 11]. Because18is the only item remaining, we simply push it to the end, so we are left with[7, 11, 18]which is what we want. We do the same process for all the other four-lettered lists (e.g.A_B_B_A). If you don't understand this merging process, have a look at step 8 where I run through another merge.
# let's call this list A_A_A because it is the first half of A_A
[1, 7]
# let's call this list A_A_B (sorted)
[7, 11, 18]
# let's call this list A_B_A because it is the first half of A_B
[1, 23]
# let's call this list A_B_B (sorted)
[5, 10, 14]
# let's call this list B_A_A because it is the first half of B_A
[3, 6]
# let's call this list B_A_B (sorted)
[20, 23, 28]
# let's call this list B_B_A because it is the first half of B_B
[11, 28]
# let's call this list B_B_B (sorted)
[8, 13, 22]
- We now sort all the elements at this level
# let's call this list A_A_A (sorted)
[1, 7]
# let's call this list A_A_B (sorted)
[7, 11, 18]
# let's call this list A_B_A (sorted)
[1, 23]
# let's call this list A_B_B (sorted)
[5, 10, 14]
# let's call this list B_A_A (sorted)
[3, 6]
# let's call this list B_A_B (sorted)
[20, 23, 28]
# let's call this list B_B_A (sorted)
[11, 28]
# let's call this list B_B_B (sorted)
[8, 13, 22]
- We now merge them again.
# A_A (sorted)
[1, 7, 7, 11, 18]
# A_B (sorted)
[1, 5, 10, 14, 23]
# B_A (sorted)
[3, 6, 20, 23, 28]
# B_B (sorted)
[8, 11, 13, 22, 28]
- We now merge again. Let's take
A_A (sorted)andA_B (sorted). We have[1, 7, 7, 11, 18]and[1, 5, 10, 14, 23]. We know that these lists are sorted (in ascending order – i.e. from smallest to biggest). We want to form a new list of 10 items (combine the two lists of five). Let's think about the first item we want to place into this list. We know that the smallest item must be the first item of one of our two lists (because the first item of each list is the smallest item, and therefore the overall smallest item will be one of these). We remove the two items (one from each list), so we have1and1. When we compare1and1we can see that they are equal, so we can push them in in either order into our list of ten sorted items – this list now looks like[1, 1]. We now pick the next two items. These are7(fromA_A) and5(fromA_B). Comparing these, it is clear that7is bigger than5. Therefore we push5onto the sorted list (which now looks like[1, 1, 5]). We keep the7, and compare it to the next item fromA_B, which is10.7is definitely smaller than10, so we push it straight onto our sorted list, which now looks like[1, 1, 5, 7]. We remove the next smallest item fromA_A(which is7, again) and compare is to the item fromA_B(the10we already removed). As before,7is smaller than10, so we push it onto our list – which now looks like[1, 1, 5, 7, 7]. We remove the next item fromA_A(i.e.11) and compare it to the item fromA_B(the same10we've already removed).11is bigger than10, so we push the10onto the sorted list (which now looks like[1, 1, 5, 7, 7, 10, 11]). We now pick the next item fromA_B(14) and conclude that11is less than14, so we push11onto the list (so we have[1, 1, 5, 7, 7, 10, 11]). Now we pick the next item fromA_A(18), which is definitely bigger than14(so we push14on the list, leaving us with[1, 1, 5, 7, 7, 10, 11, 14]). We remove the final item fromA_B(23) and compare it to18, noticing that18is less than23, so we push18and then23. The final list is therefore[1, 1, 5, 7, 7, 10, 11, 14, 18, 23].
[1, 1, 5, 7, 7, 10, 11, 14, 18, 23]
[3, 6, 8, 11, 13, 20, 22, 23, 28, 28]
- We now finish our merging, and are left with the sorted list.
[1, 1, 3, 5, 6, 7, 7, 8, 10, 11, 11, 13, 14, 18, 20, 22, 23, 23, 28, 28]
why is merge sort fast(er)?
- We can do operations in parallel – for example, when we need to merge four lists at one stage, we can use multiple processor cores so that we can merge each pair of lists at the same time rather than having to merge the first pair and the second pair sequentially.
further reading
- This animation of a merge sort might help you to understand it.
- If you want a visual explanation of merge sort, then this one is quite good.
- not on any GCSE specifications as far as I can tell somewhat related to divide and conquer is "dynamic programming" which is a programming technique commonly used for solving games of strategy – see this presentation for information
- Table of sorting algorithms