An introduction to algorithms - the foundations of computer science. Learn how to design, analyse and compare them.

Published

When we're programming, we're writing code to perform a specific task. An algorithm is the set of instructions - the procedure we ask the computer to perform - in order to complete that task. Typically, one of the main requirements we have for these procedures is that they need to complete within an appropriate amount of time.

If we are writing a program that creates real time graphics, we have only a few milliseconds to create each frame, and each frame has millions of pixels.

In other contexts, we might not be dealing with short timeframes - rather, we might be dealing with large amounts of data. If we're creating software that analyses a billion records, each additional millisecond we require to process each one adds 12 days to our overall run time.

Therefore, it is important that we be able to analyse the performance of our algorithms, understand the impact of what we find, and develop instincts for how we might improve them.

Generally, how long our algorithms take to run will depend their inputs, specifically, the size of those inputs. Our input's size might be the number of records we need to process, or the number of pixels we need to colour. When the input is larger, our algorithms typically needs to do more work.

The main thing we care about is the scaling between those two variables. As our input size increases, how many more operations does our algorithm need to perform to complete its task? Based on this analysis, we can classify how our algorithm scales into certain categories. Here are some examples:

This is the best case scenario. With constant scaling, even if our input gets bigger, our algorithm doesn't need to do any extra work. An example of a problem that can be solved in constant time is: *determine if the first element in an array of numbers is even*.

```
isEven(input) {
return input[0] % 2 == 0;
}
```

Here, it doesn't matter how big the input array is - our algorithm takes the same, **constant**, amount of time.

With linear scaling, as our input gets bigger, our algorithm needs to perform proportionally more work. If our input is twice as large, our algorithm takes approximately twice as long to complete.

An example of a problem that can be solved in linear time is: *determine how many numbers in an array are even*. To solve this, we need to inspect every element in the array. If the array is twice as large, we need do approximately twice as much work.

```
1countEven(input) {
2 int Total { 0 };
3 for (i in input) {
4 if (i % 2 == 0) Total++;
5 }
6 return Total;
7}
8
```

Here, if the array has 10 items, line 4 is executed 10 times. If the array is doubled to 20 items, line 4's execution count is also doubled to 20. In other words the number of operations increases at the same rate as the size of the input.

Algorithms in this category tend to involve nested loops. Lets imagine we have this task: *check if an array contains any duplicates*

One possible way to solve this would be using the following algorithm:

```
1hasDuplicates(input) {
2 for (i in input) {
3 for (j in input) {
4 if (i == j) return true;
5 }
6 }
7
8 return false;
9}
10
```

In this example, if our input array has 10 items, line 4 is executed up to 100 times. If the array is increased to 100 items, line 4 is now executed up to 10,000 times. With quadratic scaling, the number of operations increases increases much faster than the size of the input.

In general, the number of operations `O`

involved in this algorithm will be approximately $O = n^2$, where $n$ is the size of the input.

This is an example of a **quadratic** equation, so we'd describe this algorithm as quadratic, or having quadratic scaling.

We could also have **cubic** algorithms where $O = n^3$. This would be the case had we needed a 3rd nested loop:

```
for (i of input) {
for (j of input) {
for (k of input) {
// Code here will run n * n * n times
}
}
}
```

This pattern continues - there are **quartic** algorithms where $O = n^4$, and so on.

The following chart is a visual representation of how many operations an algorithm in each of these classifications might need to perform.

If we zoom out in this chart, we can see the differences can get quite extreme, even at modest input sizes

Even with a modest input size of 10, the huge differences in each type of algorithm are clear. This is why it's important to try to create more efficient algorithms where possible.

In real scenarios, algorithms are generally going to be much more complicated. To analyse and compare them, we simplify things a lot by removing some nuance. This does not mean these nuances do not matter - we should absolutely be mindful of them when creating our software.

However, optimising our algorithms is generally not going to make as big a difference as finding an algorithm in a totally different scaling class. Therefore, to categorise algorithms into these scaling classes, we ignore nuance in 3 main areas:

In the real world, some operations can take much more time than others. For example, it's faster to add numbers together than multiply them. It's faster to read data from memory than from the hard drive.

Never the less, to classify algorithms, we assume all operations are equally fast

One algorithm might require $n + 100$ operations whilst another might require $n$. We ignore the additive constant - both these algorithms are in the same class (linear)

One algorithm might require $50n$ operations whilst another might require $n$. We also ignore multiplicative constants - these algorithms are still both in the same class (linear)

One algorithm might require $n^2 + n$ operations whilst another might require $n^2$. With large values of $n$, the $n^2$ component will be dominant and the $n$ will be trivial. So, we classify these algorithms into the same bucket - quadratic.

With our abilty to analyse algorithms, we now have some guidance on ways we might seek to improve them. Earlier, we had this quadratic $O(n^2)$ algorithm:

```
hasDuplicates(input) {
for (i in input) {
for (j in input) {
if (i == j) return true;
}
}
return false;
}
```

Is there a way we can solve this problem in $O(1)$ time? We still need to check every element in the array to make sure there are no duplicates. If our array gets bigger, we have more to check, so constant time seems impossible.

What about $O(n)$ time? To achieve linear time, we need to iterate over our array a fixed number of times - ideallly once.

To do this, we could create an additional data structure, and store values we've previously seen in that container. Then, for each value we encounter, we can check if we've already seen it.

For this overall algorithm be done in linear time, the data structure we create will need to be able to tell us if it contains a specific value as quickly as possible. Our best option here would be a *set*.

So, we can now iterate over our array once, which requires $n$ operations. For each element, we check if a value is in a set, and add it to the set if not.

Searching and adding an item to a set can both be done in constant time. So, for each of $n$ elements, we perform two operations. This means our algorithm will have approximately $2n$ operations in total. Dropping the multiplicative constant, we have an $O(n)$ algorithm.

Our code might look like this:

```
hasDuplicates(input) {
set previousValues {};
for (i in input) {
if (previousValues.has(i) {
return true;
} else {
previousValues.add(i);
}
}
return false;
}
```

Lets compare its scaling properties to our old algorithm:

Input Size | Old Algorithm | New Algorithm |
---|---|---|

100 | 10,000 Operations | 100 Operations |

1,000 | 1,000,000 Operations | 1,000 Operations |

100,000 | 10,000,000,000 Operations | 100,000 Operations |

To quantify the difference in real world terms, if our input size is 100,000 and our new algorithm takes a second to run, our old algorithm takes 28 hours.

Increase the input size to 1,000,000 and our new algorithm scales up to taking 10 seconds, but our old algorithm scales up to 3 years.

When writing and describing the properties of algorithms, the concept of "Big O" is sometimes used. This is a form of mathematical notation used to describe the concepts covered above.

We can describe constant time algorithms as $O(1)$, linear as $O(n)$, quadratic as $O(n^2)$, and so on. More specifically, this notation describes the *upper bound* of the run time. That is, big O describes the worst case scenario.

For example, our original, quadratic algorithm for detecting duplicates might complete within 2 operations. Even if the input array has millions of items, if the first first two numbers are the same, our algorithm will complete in constant time.

However, in the worst case scenario, there are no duplicates at all in our input array. In that scenario, our algorithm needs to check $n \times n$ combinations to reach that conclusion, so it's an $O(n^2)$ algorithm.

In general, the worst case performance is what we focus on when creating and describing algorithms. If an algorithm is described simply as "quadratic", we should interpret that as meaning "quadratic in the worst case".

The average case is less important. An algorithm that is linear in both the average and worst case is almost always better than an algorithm that is faster on average, but quadratic in the worst case.

This is because the gulf between classes can have very large implications to the run times. We always have to design and prepare our software and processes for the worst case scenario. That is much more practical to do with an algorithm that *always* completes within 10 seconds, than for one that takes only 10 milliseconds on average, but sometimes requires 3 days.

All algorithms covered in this lesson are examples of polynomial scaling.

Polynomials have the form $O(n^k)$ where $k$ is a constant, that is, $k$ is unrelated to the size of the data

Linear scaling - $O(n)$ - is polynomial as it is equivalent to $O(n^1)$. Constant scaling - $O(1)$ - is polynomial as it is equivalent to $O(n^0)$

There are algorithms that scale much faster than polynomials - this includes exponential scaling - $O(k^n)$ , and factorial scaling - $O(n!)$

We'll see examples of these in later lessons

Become a software engineer with C++. Starting from the basics, we guide you step by step along the way

Free, unlimited access!- 66 Lessons
- Over 200 Quiz Questions
- Capstone Project
- Regularly Updated
- Help and FAQ

Subscribe to this course to return to it later, and get updates on future changes