The core principle behind a merge sort is divide-and-conquer. This is a technique for solving programming problems which relies on discovering a way to divide the problem up into smaller problems, and then solving each of the smaller problems at the same time (see multicore machines for an explanation of why we can do multiple things at once on a computer).
We start with a list. Note that this list is a bit longer than the bubble sort example one because it demonstrates the advantage of merge sort better.
[1, 7, 7, 11, 18, 1, 23, 5, 10, 14, 3, 6, 20, 23, 28, 11, 28, 13, 8, 22]
The key to thinking about merge sort is that we are trying to maximise the number of operations that we can conduct in parallel (i.e. at the same time). We start by dividing up our list into groups of two (if we can't fit everything into neat groups of two – i.e. if the length of our list is not divisible by two –, we do what we always do in computer science and just try to make it fit as best we can – 'imperfect' solutions that work are fine).
- Our list has twenty items (the format I have provided it in is designed to be easy to paste it as Python code, so you can follow along and experiment). First we divide our list into two halves. This means we have:
# let's call this list A
[1, 7, 7, 11, 18, 1, 23, 5, 10, 14]
# let's call this list B
[3, 6, 20, 23, 28, 11, 28, 13, 8, 22]
- We now divide our halves again.
# let's call this list A_A because it is the first half of A
[1, 7, 7, 11, 18]
# let's call this list A_B because it is the second half of A
[1, 23, 5, 10, 14]
# let's call this list B_A
[3, 6, 20, 23, 28]
# let's call this list B_B
[11, 28, 13, 8, 22]
- We do this again.
# let's call this list A_A_A because it is the first half of A_A
[1, 7]
# let's call this list A_A_B because it is the second half of A_A
[7, 11, 18]
# let's call this list A_B_A because it is the first half of A_B
[1, 23]
# let's call this list A_B_B because it is the second half of A_B
[5, 10, 14]
# let's call this list B_A_A because it is the first half of B_A
[3, 6]
# let's call this list B_A_B because it is the second half of B_A
[20, 23, 28]
# let's call this list B_B_A because it is the first half of B_B
[11, 28]
# let's call this list B_B_B because it is the second half of B_B
[13, 8, 22]
- Once we've performed the division at stage 3, we have a lot of lists that are three elements long, and some that are two elements long. What we do here is divide up the three-element lists into a two element list and a one element long list. We leave the two element lists unchanged.
# let's call this list A_A_A because it is the first half of A_A
[1, 7]
# let's call this list A_A_B_A because it is the first half of A_A_B
[7, 11]
# let's call this list A_A_B_B because it is the second half of A_A_B
[18]
# let's call this list A_B_A because it is the first half of A_B
[1, 23]
# let's call this list A_B_B_A because it is the first half of A_B_B
[5, 10]
# let's call this list A_B_B_B because it is the second half of A_A_B
[14]
# let's call this list B_A_A because it is the first half of B_A
[3, 6]
# let's call this list B_A_B_A because it is the first half of B_A_B
[20, 23]
# let's call this list B_A_B_B because it is the second half of B_A_B
[28]
# let's call this list B_B_A because it is the first half of B_B
[11, 28]
# let's call this list B_B_B_A because it is the first half of B_B_B
[13, 8]
# let's call this list B_B_B_B because it is the second half of B_B_B
[22]
- We have now finished our division. The next step is to merge and sort the lists. We want to merge all the
X_X_X_A
(whereX_X_X
is any threeA
s orB
s in an order). Let's start withA_A_B_A
andA_A_B_B
(so[7, 11]
and[18]
). To sort these we do what is called "card-player's trick" where we pick the smallest item of each list (thepop
function in most programming languages). First we pick the first item of[7, 11]
, which is7
. We then compare7
to18
–7
is definitely smaller, so we place7
as hte first item of our sorted list (so we now have [7
]). We now pick the next smallest item (11
) and compare it to18
.11
is smaller, so we place it next in the sorted list – we now have[7, 11]
. Because18
is the only item remaining, we simply push it to the end, so we are left with[7, 11, 18]
which is what we want. We do the same process for all the other four-lettered lists (e.g.A_B_B_A
). If you don't understand this merging process, have a look at step 8 where I run through another merge.
# let's call this list A_A_A because it is the first half of A_A
[1, 7]
# let's call this list A_A_B (sorted)
[7, 11, 18]
# let's call this list A_B_A because it is the first half of A_B
[1, 23]
# let's call this list A_B_B (sorted)
[5, 10, 14]
# let's call this list B_A_A because it is the first half of B_A
[3, 6]
# let's call this list B_A_B (sorted)
[20, 23, 28]
# let's call this list B_B_A because it is the first half of B_B
[11, 28]
# let's call this list B_B_B (sorted)
[8, 13, 22]
- We now sort all the elements at this level
# let's call this list A_A_A (sorted)
[1, 7]
# let's call this list A_A_B (sorted)
[7, 11, 18]
# let's call this list A_B_A (sorted)
[1, 23]
# let's call this list A_B_B (sorted)
[5, 10, 14]
# let's call this list B_A_A (sorted)
[3, 6]
# let's call this list B_A_B (sorted)
[20, 23, 28]
# let's call this list B_B_A (sorted)
[11, 28]
# let's call this list B_B_B (sorted)
[8, 13, 22]
- We now merge them again.
# A_A (sorted)
[1, 7, 7, 11, 18]
# A_B (sorted)
[1, 5, 10, 14, 23]
# B_A (sorted)
[3, 6, 20, 23, 28]
# B_B (sorted)
[8, 11, 13, 22, 28]
- We now merge again. Let's take
A_A (sorted)
andA_B (sorted)
. We have[1, 7, 7, 11, 18]
and[1, 5, 10, 14, 23]
. We know that these lists are sorted (in ascending order – i.e. from smallest to biggest). We want to form a new list of 10 items (combine the two lists of five). Let's think about the first item we want to place into this list. We know that the smallest item must be the first item of one of our two lists (because the first item of each list is the smallest item, and therefore the overall smallest item will be one of these). We remove the two items (one from each list), so we have1
and1
. When we compare1
and1
we can see that they are equal, so we can push them in in either order into our list of ten sorted items – this list now looks like[1, 1]
. We now pick the next two items. These are7
(fromA_A
) and5
(fromA_B
). Comparing these, it is clear that7
is bigger than5
. Therefore we push5
onto the sorted list (which now looks like[1, 1, 5]
). We keep the7
, and compare it to the next item fromA_B
, which is10
.7
is definitely smaller than10
, so we push it straight onto our sorted list, which now looks like[1, 1, 5, 7]
. We remove the next smallest item fromA_A
(which is7
, again) and compare is to the item fromA_B
(the10
we already removed). As before,7
is smaller than10
, so we push it onto our list – which now looks like[1, 1, 5, 7, 7]
. We remove the next item fromA_A
(i.e.11
) and compare it to the item fromA_B
(the same10
we've already removed).11
is bigger than10
, so we push the10
onto the sorted list (which now looks like[1, 1, 5, 7, 7, 10, 11]
). We now pick the next item fromA_B
(14
) and conclude that11
is less than14
, so we push11
onto the list (so we have[1, 1, 5, 7, 7, 10, 11]
). Now we pick the next item fromA_A
(18
), which is definitely bigger than14
(so we push14
on the list, leaving us with[1, 1, 5, 7, 7, 10, 11, 14]
). We remove the final item fromA_B
(23
) and compare it to18
, noticing that18
is less than23
, so we push18
and then23
. The final list is therefore[1, 1, 5, 7, 7, 10, 11, 14, 18, 23]
.
[1, 1, 5, 7, 7, 10, 11, 14, 18, 23]
[3, 6, 8, 11, 13, 20, 22, 23, 28, 28]
- We now finish our merging, and are left with the sorted list.
[1, 1, 3, 5, 6, 7, 7, 8, 10, 11, 11, 13, 14, 18, 20, 22, 23, 23, 28, 28]
why is merge sort fast(er)?
- We can do operations in parallel – for example, when we need to merge four lists at one stage, we can use multiple processor cores so that we can merge each pair of lists at the same time rather than having to merge the first pair and the second pair sequentially.
further reading
- This animation of a merge sort might help you to understand it.
- If you want a visual explanation of merge sort, then this one is quite good.
- not on any GCSE specifications as far as I can tell somewhat related to divide and conquer is "dynamic programming" which is a programming technique commonly used for solving games of strategy – see this presentation for information
- Table of sorting algorithms