An introduction to algorithms - the foundations of computer science. Learn how to design, analyze, and compare them.

Edited

When we're programming, we're writing code to perform a specific task. An * algorithm* is a set of instructions - the procedure we ask the computer to perform - to complete thatÂ task.

For any task, there are a huge number of possible algorithms we can write to complete it. How do we decide on which one toÂ use?

Typically, one of the main considerations we have is that we generally prefer algorithms that require fewer actions to complete. An algorithm that requires fewer instructions to complete its task is generally going to be able to complete that taskÂ faster.

This concept is sometimes referred to as * time complexity*. Time complexity describes the amount of computer time it takes to complete an algorithm. We prefer algorithms with less timeÂ complexity.

Being conscious of the performance of our algorithms is particularly important in twoÂ scenarios:

If we are writing a program that involves real-time interactivity, we typically want to update the user's screen at least 60 times per second to make our application feelÂ responsive.

This gives us approximately 16 milliseconds to generate each update. If one of the algorithms we use in that process takes 10 milliseconds, most of our budget is immediatelyÂ gone.

In other contexts, we might not be dealing with short timeframes - rather, we might be dealing with large amounts of data. If we're creating software that analyses a billion records, and our algorithm spends 10 milliseconds on each record, our program will take four months toÂ run.

Generally, how long our algorithms take to run will depend on their inputs, specifically, the size of thoseÂ inputs.

Our input's size might be the number of pixels we need to color, or the number of records we need to process. When the input is larger, our algorithms typically need to do moreÂ work.

The main thing we care about is how the input size affects the number of operations our algorithm needs to perform. As our input size increases, how many more operations does our algorithm need to perform to complete itsÂ task?

This is referred to as the scaling of our algorithm. Below, we provide examples of the most common scalingÂ factors:

This is the best-case scenario. With constant scaling, even if our input gets bigger, our algorithm doesn't need to do any extra work. An example of a problem that can be solved in constant time is: "*determine if the first element in an array of numbers is even"*

```
bool FirstIsEven(std::vector<int>& Input) {
return Input[0] % 2 == 0;
}
```

Here, it doesn't matter how big the input array is - our algorithm takes the same, **constant**, amount ofÂ time.

With linear scaling, as our input gets bigger, our algorithm needs to perform proportionally more work. If our input is twice as large, our algorithm needs to do approximately twice as muchÂ work.

An example of a problem that can be solved in linear time is: *determine how many numbers in an array are even*. To solve this, we need to inspect every element in theÂ array.

If the array is twice as large, we need to do approximately twice as much work. We can often identify a linear algorithm by the fact it iterates over every object in ourÂ collection:

```
int CountEven(const std::vector<int>& Input) {
int Total { 0 };
for (int i : Input) {
if (i % 2 == 0) ++Total;
}
return Total;
}
```

Here, if the array has 10 items, the highlighted line is executed 10 times. If the array is doubled to 20 items, the number of times the line is executed is also doubled to 20. In other words, the number of operations scales * linearly* with the size of theÂ input.

Algorithms in this category tend to involve nested loops. Let's imagine we have the task: "*check if an array contains any duplicates"*

Weâ€™ll introduce a better way to solve this problem in the next lesson, but one possible solution we might consider would be to use nestedÂ loops:

```
bool HasDuplicate(std::vector<int>& Input) {
for (size_t i{0}; i < Input.size(); ++i) {
for (size_t j{0}; j < Input.size(); ++j) {
if (Input[i] == Input[j] && i != j) {
return true;
}
}
}
return false;
}
```

In this example, if our input array has 10 items, the outer `for`

loop performs 10 iterations, and each iteration causes the inner `for`

loop to perform 10Â iterations.

The net effect is that the highlighted `if`

statement is evaluated up to 100 times - 10 xÂ 10.

If the array is increased to 100 items, the `if`

block is now executed up to 10,000 times. With quadratic scaling, the number of operations increases much faster than the size of theÂ input.

In general, if our input contains $n$ objects, this algorithm requires $n \times n$, or $n^2$ operations to complete its task. This is an example of **quadratic**Â scaling.

Given the performance of algorithms generally depends on the number of objects in the collection it is working on, we typically describe performance in terms of that variable. We typically call the variable $n$.

We could also have **cubic** algorithms that require $n \times n \times n$, or $n^3$ operations. This would be the case had we needed a 3rd nestedÂ loop:

```
for (int i : Input) {
for (int j : Input) {
for (int k : Input) {
// Code here will run n * n * n times
}
}
}
```

This pattern continues - there are **quartic** algorithms requiring $n^4$ operations, and soÂ on.

The following chart is a visual representation of how many operations an algorithm in each of these classifications might need toÂ perform.

If we zoom out in this chart, we can see the differences can get quiteÂ extreme:

Even with a modest input size of 10, the huge differences in each type of algorithm areÂ clear.

The concept of "Big O" is a mathematical notation that is used to represent the concepts coveredÂ above.

We can imagine the algorithms we create as being within certain classes ofÂ scaling:

- Constant algorithms are noted as $O(1)$
- Linear algorithms are noted as $O(n)$
- Quadratic algorithms are noted as $O(n^2)$, and so on.

It is quite rare that the algorithms we create to solve real-world problems fit exactly into these equations. For example, an algorithm might have $50n^2$ operations; another might require $n^2 + 10n + 30$Â operations.

Big O notation simplifies things to just the core classification - both of these algorithms would be $O(n^2)$. This simplification allows us to communicate and compare approaches moreÂ simply.

Specifically, it makes the followingÂ simplifications:

In the real world, some operations can take much more time than others. For example, CPUs can perform multiplication faster than division and can read data from memory faster than from the hardÂ drive.

But to keep things simple, we assume all operations are equallyÂ fast

One algorithm might require $n + 100$ operations whilst another might require $n$. We ignore the additive constant - both these algorithms are in the same class - $O(n)$Â (linear)

One algorithm might require $50n$ operations whilst another might require $n$. We also ignore multiplicative constants - these algorithms are still both in the same class - $O(n)$Â (linear)

One algorithm might require $n^2 + n$ operations whilst another might require $n^2 + 50$. As we saw in the graphs above, the term with the highest order (that is, the largest exponent) dominates the behavior of the function as $n$Â increases

Because of this, we just ignore all the lower-order terms. Both of these algorithms are simply $O(n^2)$Â (quadratic)

Even though Big O notation ignores some nuance, that doesnâ€™t necessarily mean we should ignore it when designing real-world programs. A $5n^3$ algorithm is still twice as fast as a $10n^2$ algorithm, even though theyâ€™re both $O(n^2)$

The key point of Big O is that, if a $O(n)$ solution can be found to solve that same problem, thatâ€™s much better than finding ways to optimize an $O(n^2)$Â approach.

But if an $O(n)$ solution canâ€™t be found, optimizing the $O(n^2)$ solution can still be a goodÂ idea.

Big O specifically describes the *upper bound* of the run time. That is, big O describes the worst-caseÂ scenario.

For example, our previous quadratic algorithm for detecting duplicates was $O(n^2)$, but it might complete much faster. The algorithm could return `true`

after checking only two objects, if those two objects areÂ equal.

So in the best-case scenario, it doesnâ€™t matter how big the input is - we only need to check two objects. Therefore, our best-case performance is * constant*.

However, in the worst-case scenario, there are no duplicates at all in our input array. Our function needs to check $n \times n$ combinations before it can reach that conclusion, so it's an $O(n^2)$Â algorithm.

In general, the worst-case performance is what we focus on when creating algorithms. If an algorithm is described simply as "quadratic", we should interpret that as meaning "quadratic in the worstÂ case".

The best and average cases are less important, because we always have to design and prepare our software and processes for the worst-case scenarioÂ anyway.

That is much more practical to do with an algorithm that *always* completes within 10 seconds, than for one that takes only one second on average, but sometimes takes 3Â days.

Itâ€™s important not to confuse the number of operations an algorithm performs with the number of statements itÂ has.

The following function contains a single statement, but that doesnâ€™t necessarily mean the function has a constant run time. To understand the time complexity of `HandleArray()`

, we would need to understand the time complexity of the function itÂ calls:

```
void HandleArray(std::vector<int>& Input) {
SomeFunction(Input); // ?
}
```

If `SomeFunction()`

is linear, requiring $n$ operations, then `HandleArray()`

would also beÂ linear.

In the following example, if `SomeFunction()`

is linear, then `HandleArray()`

will be quadratic, requiring $n \times n$, or $n^2$Â operations:

```
void HandleArray(std::vector<int>& Input) {
for (int x : Input) {
SomeFuncion(Input, x);
}
}
```

Remember, operators are also functions. So, to understand the time complexity of this function, we need to understand the time complexity of `SomeType::operator+=(int)`

:

```
void Add(SomeType& Input) {
Input += 1; // ?
}
```

When weâ€™re using third-party functions, the time complexity will often be listed in theÂ documentation.

The complexity of standard library functions will be listed in most online references. For example, cppreference.com lists the complexity of `std::vector`

â€™s `erase()`

method asÂ linear.

All algorithms covered in this lesson are examples of * polynomials*.

Polynomials have the form $O(n^k)$ where $k$ is a constant - that is, $k$ is unrelated to the size of theÂ data

- Constant scaling - $O(1)$ - is polynomial as it is equivalent to $O(n^0)$
- Linear scaling - $O(n)$ - is polynomial as it is equivalent to $O(n^1)$

Most algorithms we work with are polynomials, but some problems cannot be solved in polynomial time. A common example is the Travelling salesman problem.

Common non-polynomial algorithms include exponential - $O(2^n)$, and factorial - $O(n!)$.

In this lesson, we explored the fundamentals of algorithm analysis and Big O notation, understanding how different algorithms scale with input size and how to categorize their efficiency in terms of timeÂ complexity.

- Algorithms vary in efficiency; their performance is often measured using time complexity, with Big O notation providing a standardized way to express this.
- We examined constant, linear, and quadratic scaling, highlighting how the size of input impacts the number of operations in an algorithm.
- Big O notation simplifies algorithm complexity by focusing on the dominant term and ignoring constants and lower-order terms.
- The distinction between operations and statements in code was clarified, emphasizing the need to understand the complexity of underlying functions.

In our upcoming lesson, we'll dive into the world of data structures. We'll explore how different data structures can significantly impact the performance and efficiency of algorithms, building upon the concepts of algorithm analysis and Big O notation discussed in thisÂ lesson.

**Introduction to Common Data Structures**: Learn how arrays compare to two other common data structures - the hash set and the linked list.**Impact on Algorithm Efficiency**: Understand how the choice of a data structure can affect an algorithm's time complexity and overall performance.**Improving Algorithms using Data Structures**: Weâ€™ll learn how data structures can work together to create more efficient algorithms.**Practical Examples:**Weâ€™ll add a new data structure to the`HasDuplicate()`

function from this lesson, to improve its run time from $O(n^2)$ to $O(n)$

â€” Refreshed Content

â€” First Published

**CppReference**- Complexity of Standard Library Functions

https://en.cppreference.com/w/cpp/container**Wikipedia**- Big O Notation

https://en.wikipedia.org/wiki/Big_O_notation**MIT OpenCourseWare**- Introduction to Algorithms

https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/**Khan Academy**- Algorithms

https://www.khanacademy.org/computing/computer-science/algorithms**Coursera**- Algorithmic Toolbox

https://www.coursera.org/learn/algorithmic-toolbox**Stanford Online**- Algorithms: Design and Analysis

https://online.stanford.edu/courses/soe-ycsalgorithms1-algorithms-design-and-analysis-part-1**Open Standards**- C++ Standards

https://www.open-std.org/jtc1/sc22/wg21/**YouTube**- Big O Notation - Computerphile

https://www.youtube.com/watch?v=kgBjXUE_Nwc

Edited

Lesson Contents### Algorithm Analysis and Big O Notation

An introduction to algorithms - the foundations of computer science. Learn how to design, analyze, and compare them.

This lesson is part of theÂ course:### Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.