Set Algorithms

An introduction to set algorithms, and how to implement them using the C++ standard library
This lesson is part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

3D Character Concept Art
Ryan McCombe
Ryan McCombe
Posted

This lesson introduces the 5 main standard library algorithms that are used when working with sets. A set is simply a collection of objects that does not include any duplicates. For example, the numbers 1, 2, and 3 are a set.

Within maths and computer science writing, a set is often denoted as a comma-separated list surrounded by braces, and we’ll use the same convention here.

For example, the set containing the numbers 1, 2, and 3 would be written {1,2,3}\{1, 2, 3\}

Some additional points are worth noting:

1. A set is not restricted to numbers

The examples in most of this lesson use integers, but a set can contain any type of object. For example, {red,green,blue}\{red, green, blue\} is also a set.

Later in this lesson, we’ll show how we can use the algorithms with sets containing any type of data, including our own custom types.

2. Sets do not need to be stored in ‚Äúset‚ÄĚ containers

The sets we’re talking about here do not need to be stored in a container specifically designed for sets, such as a std::unordered_set or a std::set.

A set is a more generic concept than a type of container. Whilst a set can be contained in a std::set, it doesn’t need to be. A set can be stored in an array, linked list, or a wide range of other possible containers.

3. These algorithms require the inputs to be iterable and sorted

The algorithms in this lesson require the containers holding the set to be iterable, through having implemented a forward iterator. Most of the containers we’ve seen are compatible, including std::array, std::vector, std::list, std::set and more.

The algorithms additionally require the input sets to be sorted. By default, they assume the sets are sorted in ascending order. That means if A < B, then A is before B in the container.

Later in the lesson, we’ll see how we can override that assumption. This lets the algorithms work regardless of how our inputs are sorted, and even with types that aren’t even sortable.

Let's introduce the algorithms, which are all available within the <algorithm> header:

#include <algorithm>

Algorithm 1: Subsets and Supersets using includes()

The most basic set algorithm checks if a set contains another set.

std::ranges::includes(A, B)

This will return true if A ‚ÄĚincludes‚ÄĚ B. In other words, it checks if all of the objects in B are also in A.

#include <algorithm>
#include <set>
#include <iostream>

int main(){
  std::set A{1, 2, 3};
  std::set B{1, 2};

  if (std::ranges::includes(A, B)) {
    std::cout << "A includes B";
  }
}
A includes B

Supersets and Subsets

A set is sometimes referred to as a superset of any other set it includes. For example, if A includes B, A is a superset of B. Within textbooks and other resources, the symbol ‚äá\supseteq is sometimes used to mean ‚Äúis a superset of‚ÄĚ. For example: {1,2,3}‚äá{1,2}\{ 1, 2, 3 \} \supseteq \{1,2\}.

We can also say B is a subset of A. The symbol sometimes used to communicate this is ⊆\subseteq, that is, the superset symbol with its direction reversed. For example, {1,2}⊆{1,2,3}\{ 1, 2 \} \subseteq \{1,2,3\}

Note that the argument order matters here. If A includes B, that doesn’t necessarily mean that B includes A:

#include <algorithm>
#include <set>
#include <iostream>

int main(){
  std::set A{1, 2, 3};
  std::set B{1, 2};

  if (std::ranges::includes(A, B)) {
    std::cout << "B is a subset of A";
  }

  if (!std::ranges::includes(B, A)) {
    std::cout << "\nA is NOT a subset of B";
  }
}
B is a subset of A
A is NOT a subset of B

Algorithm 2: Union using set_union()

The union of two sets includes any objects that are in either of the sets. For example, the union of {1,2,3}\{ 1, 2, 3 \} and {3,4,5}\{ 3, 4, 5 \} is {1,2,3,4,5}\{ 1, 2, 3, 4, 5 \}

Within textbooks and other resources, the symbol ‚ą™\cup is sometimes used to denote the union operation. For example, {1,2,3}‚ą™{3,4,5}={1,2,3,4,5}\{ 1, 2, 3 \} \cup \{3,4,5\} = \{1,2,3,4,5\}

The std::ranges::set_union algorithm provides an implementation of this. Its basic usage involves three arguments - the two sets we want to use, and an iterator to where we want the union to be created.

We need to there is enough memory in the result location to accommodate the union. The maximum size a union can be will be the combined size of both of the input sets. Below, we resize a std::vector to match that size, and then use the std::ranges::set_union function to populate it with the union results:

#include <algorithm>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{3, 4, 5};
  std::vector<int> Results;
  Results.resize(A.size() + B.size());

  std::ranges::set_union(A, B, Results.begin());
}

After running this code, our Results vector will have a size() of 6 with the union {1,2,3,4,5}\{ 1, 2, 3, 4, 5 \} in the first 5 slots. We can shrink the result collection to contain only the union members by using the value returned from std::ranges::set_union().

This will be a std::ranges::set_union_result struct containing three iterators, which may be useful for follow-up operations. The three iterators are:

  1. in1 - The past-the-end iterator for the set passed as the first argument - equivalent to A.end() in this example
  2. in2 - The past-the-end iterator for the set passed as the second argument - equivalent to B.end() in this example
  3. out - An iterator pointing to the last element of the union, within the result collection

These three iterators are typically accessed using structured binding. The third tends to be the most useful in real-world applications, as it gives us an easy way to tell how big the union is, and to shrink the result collection to just contain those elements:

#include <algorithm>
#include <iostream>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{3, 4, 5};
  std::vector<int> Results;
  Results.resize(A.size() + B.size());

  auto [AEnd, BEnd, UnionEnd]{
    std::ranges::set_union(
      A, B, Results.begin())
  };

  std::cout << "Union Size: "
    << UnionEnd - Results.begin()
    << '\n';

  Results.erase(UnionEnd, Results.end());
  for (auto x : Results) {
    std::cout << x << ", ";
  }
}
Union Size: 5
1, 2, 3, 4, 5,

std::set_union() vs std::merge()

Earlier, we introduced the std::merge() algorithm, which works similarly to set_union(). The main difference is that a set - such as what is returned by set_union() - cannot contain duplicates. An arbitrary collection - such as what is returned by merge - can.

In the previous example, our inputs where {1,2,3}\{ 1, 2, 3 \} and {3,4,5}\{ 3, 4, 5 \}. These both include the number 3. Only one is in the union, yielding a set with 5 entries: {1,2,3,4,5}\{ 1, 2, 3, 4, 5 \}

When merging, both would be included in the result, which would generate a collection of 6 objects: 1, 2, 3, 3, 4, and 5

If our two input sets do not overlap - that is, they contain no objects in common, the result of set_union() and merge() will be the same.

If two sets do not overlap, they are sometimes referred to as disjoint sets. For example, {1,2,3}\{ 1, 2, 3 \} and {4,5,6}\{ 4, 5, 6 \} are disjoint.

Algorithm 3: Intersection using set_intersection()

The intersection of two sets includes all the objects that are in both sets. For example, the intersection of {1,2,3}\{ 1, 2, 3 \} and {2,3,4}\{ 2, 3, 4 \} is {2,3}\{ 2, 3 \}

Within textbooks and other resources, the symbol ‚ą©\cap is sometimes used to denote the intersection operation. For example, {1,2,3}‚ą©{2,3,4}={2,3}\{ 1, 2, 3 \} \cap \{ 2, 3, 4 \} = \{2,3\}

The std::ranges::set_intersection and std::set_intersection algorithms provide a way to create this intersection.

As with the union operation, we need to ensure there is enough memory in the result location to accommodate the intersection. The maximum size of a set intersection will be the size of the smaller of the two input sets.

The std::min() function can help us here, returning the smallest value of its arguments:

#include <algorithm>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{2, 3, 4};
  std::vector<int> Results;
  Results.resize(std::min(A.size(), B.size()));

  std::ranges::set_intersection(
    A, B, Results.begin()
  );
}

Similar to the union algorithms, the std::ranges::set_intersection algorithm returns a std::ranges::set_intersection_result struct, with three members:

  1. in1 - The past-the-end iterator for the first argument (equivalent to what is returned from a call to .end() on it)
  2. in2 - The past-the-end iterator for the second argument (equivalent to what is returned from a call to .end() on it)
  3. out - An iterator pointing to the last element inserted into the result collection

As with the union algorithms, these are commonly accessed using structured binding, and the third variable is the most useful:

#include <algorithm>
#include <iostream>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{2, 3, 4};
  std::vector<int> Results;
  Results.resize(std::min(A.size(), B.size()));

  auto [AEnd, BEnd, IntersectionEnd]{
    std::ranges::set_intersection(
      A, B, Results.begin())
  };

  std::cout << "Intersection Size: "
    << IntersectionEnd - Results.begin()
    << '\n';

  Results.erase(IntersectionEnd, Results.end());
  for (auto x : Results) {
    std::cout << x << ", ";
  }
}
Intersection Size: 2
2, 3,

Algorithm 4: Difference (Subtraction) using set_difference()

The difference operation returns all the objects in the first set after any objects in the second set are removed.

Within textbooks and other learning resources, the subtraction symbol ‚ąí- or division symbol \ is sometimes used to denote a difference operation. For¬†example:

  • {1,2,3}‚ąí{3,4,5}={1,2}\{ 1, 2, 3 \} - \{ 3, 4, 5 \} = \{1, 2\}
  • {3,4,5}‚ąí{1,2,3}={4,5}\{ 3, 4, 5 \} - \{ 1, 2, 3 \} = \{4, 5\}

We need to ensure there is enough memory in the result location to accommodate the intersection. The maximum size of a set difference will be the size of the first input argument:

#include <algorithm>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{3, 4, 5};
  std::vector<int> Results;
  Results.resize(A.size());

  std::ranges::set_difference(
    A, B, Results.begin()
  );
}

The return value of set_difference() is slightly simpler than the previous algorithms, as it returns a struct containing only two iterators:

  1. in - The past-the-end iterator for the first argument (equivalent to what is returned from a call to .end() on it)
  2. out - An iterator pointing to the last element inserted into the result collection

As usual, these are commonly accessed using structured binding, and the out variable is the most useful:

#include <algorithm>
#include <iostream>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{3, 4, 5};
  std::vector<int> Results;
  Results.resize(A.size());

  auto [AEnd, DifferenceEnd]{
    std::ranges::set_difference(
      A, B, Results.begin())
  };

  std::cout << "Difference Size: "
    << DifferenceEnd - Results.begin()
    << '\n';

  Results.erase(DifferenceEnd, Results.end());
  for (auto x : Results) {
    std::cout << x << ", ";
  }
}
Difference Size: 2
1, 2,

Algorithm 5: Symmetric Difference using set_symmetric_difference()

The intersection of two sets includes all the objects that are in either set, but not both sets. We can think of this as having similar characteristics to an exclusive or (xor) operation. It’s also equivalent to calculating both the union and intersection of two sets, and then subtracting the intersection from the union.

For example, the symmetric difference of {1,2,3}\{ 1, 2, 3 \} and {2,3,4}\{ 2, 3, 4 \} is {1,4}\{ 1, 4 \}

Within textbooks and other resources, the symbol ‚Ė≥\triangle is sometimes used to denote the symmetric difference. For example, {1,2,3}‚Ė≥{2,3,4}={1,4}\{ 1, 2, 3 \} \triangle \{2,3,4\} = \{1,4\}

We need to ensure there is enough memory in the result location to accommodate the symmetric difference. The maximum size of a symmetric difference will be the combined size of both of the inputs:

#include <algorithm>
#include <vector>

int main() {
  std::vector A { 1, 2, 3 };
  std::vector B { 2, 3, 4 };
  std::vector<int> Results;
  Results.resize(A.size() + B.size());

  std::ranges::set_symmetric_difference(
    A, B, Results.begin()
  );
}

The set_symmetric_difference has the same return type as the union and intersection algorithms. It is a struct containing:

  1. in1 - The past-the-end iterator for the first argument (equivalent to what is returned from a call to .end() on it)
  2. in2 - The past-the-end iterator for the second argument (equivalent to what is returned from a call to .end() on it)
  3. out - An iterator pointing to the last element inserted into the result collection

Below, we show an example of accessing these using structured binding, and using out property for some common follow-up operations:

#include <algorithm>
#include <iostream>
#include <vector>

int main(){
  std::vector A{1, 2, 3};
  std::vector B{2, 3, 4};
  std::vector<int> Results;
  Results.resize(A.size() + B.size());

  auto [AEnd, BEnd, SymmetricDifferenceEnd]{
    std::ranges::set_symmetric_difference(
      A, B, Results.begin())
  };

  std::cout << "Symmetric Difference Size: "
    << SymmetricDifferenceEnd - Results.begin()
    << '\n';

  Results.erase(SymmetricDifferenceEnd,
                Results.end());

  for (auto x : Results) {
    std::cout << x << ", ";
  }
}
Symmetric Difference Size: 2
1, 4,

Custom Comparers

In the introduction, we covered how the set algorithms assume the inputs are sorted in ascending order. That is, if A < B returns true, then A should be before B in the container.

This is customizable by providing an additional argument to the algorithm. This argument should be a function that will receive two objects from our set, either as values or, more commonly, by const reference.

The function should return true if the first parameter comes before the second parameter in our container.

Below, we use this argument to tell our algorithm that our inputs are sorted in descending order, rather than the default assumption of ascending:

#include <algorithm>
#include <set>
#include <iostream>

int main(){
  std::set A{3, 2, 1};
  std::set B{2, 1};

  auto Comparer{
    [](const int& A, const int& B){
      return A > B;
    }};

  if (std::ranges::includes(A, B, Comparer)) {
    std::cout << "A includes B";
  }
}
A includes B

Standard Library Function Objects

Template function objects that implement simple binary operations like > are available in the standard library. For example, our previous example could replace our lambda with a call to std::ranges::greater():

if (std::ranges::includes(
  A, B, std::ranges::greater())
) {
  std::cout << "A includes B";
}

The ability to pass a custom comparator allows our algorithms to work with sets containing any type of object, including those that are not inherently sortable. Below, we use a set algorithm with collections of custom types:

#include <algorithm>
#include <iostream>
#include <vector>

class Character {
public:
  Character(){};
  Character(std::string Name) : Name{Name}{}
  std::string Name;
};

int main(){
  using namespace std::string_literals;
  std::vector<Character> A{
    "Aragorn"s, "Frodo"s, "Gimli"s, "Legolas"s};
  std::vector<Character> B{
    "Frodo"s, "Legolas"s};

  auto Comparer{
    [](const Character& A,
       const Character& B){
      return A.Name < B.Name;
    }};

  if (std::ranges::includes(A, B, Comparer)) {
    std::cout << "A includes B";
  }
}
A includes B

Here is a more complex example, where we create the union instead:

#include <algorithm>
#include <iostream>
#include <vector>

class Character {
public:
  Character(){};
  Character(std::string Name) : Name{Name}{}
  std::string Name;
};

int main(){
  using namespace std::string_literals;
  std::vector<Character> A{
    "Aragorn"s, "Frodo"s, "Gimli"s};
  std::vector<Character> B{
    "Frodo"s, "Gandalf"s, "Legolas"s};
  std::vector<Character> Results;
  Results.resize(A.size() + B.size());

  auto Comparer{
    [](const Character& A,
       const Character& B){
      return A.Name < B.Name;
    }};

  auto [AEnd, BEnd, UnionEnd]{
    std::ranges::set_union(
      A, B, Results.begin(), Comparer)
  };

  std::cout << "Union Size: "
    << UnionEnd - Results.begin()
    << '\n';

  Results.erase(UnionEnd, Results.end());
  for (auto x : Results) {
    std::cout << x.Name << ", ";
  }
}
Union Size: 5
Aragorn, Frodo, Gandalf, Gimli, Legolas,

If we have a custom type where it makes sense for them to be sortable, we can implement support for that within the class itself.

This makes our classes more versatile and removes the need for standalone comparison operators like this scattered through our code.

We show how to do that in our later chapter on Objects and Classes

Projection

Similar to other range-based algorithms, the set algorithms support projection.

This allows our objects to be projected into a different form prior to the comparison. We do this by passing a function that will receive an object in our container, and return the projection of that object that we want to use.

A common use case for this is when our objects are complex types, and we want the comparison to be done on a variable within that object.

Below, we implement our previous example using projection instead of a custom comparison.

Two things to note here:

  • We want to use the default comparer, so we pass a blank argument {} in that position
  • The subsequent arguments are two projection functions - one to use for the objects in the set we provided as the first argument (A in this example) and one to use for the objects in the set we provided as the first argument (B in this example). In this case, we use the same projection function for both sets, but they can be different
#include <algorithm>
#include <iostream>
#include <vector>

class Character {
public:
  Character(){};
  Character(std::string Name) : Name{Name}{}
  std::string Name;
};

int main(){
  using namespace std::string_literals;
  std::vector<Character> A{
    "Aragorn"s, "Frodo"s, "Gimli"s, "Legolas"s};
  std::vector<Character> B{
    "Frodo"s, "Legolas"s};

  auto Projector{
    [](const Character& C){ return C.Name; }};

  if (std::ranges::includes(
    A, B, {}, Projector, Projector
  )) { std::cout << "A includes B"; }
}
A includes B

Iterator Based Algorithms

The examples in this lesson are all using the C++20 ranges library, but variations of all of these algorithms that work with iterator pairs are available instead.

To use the iterator versions, we need to:

  • Remove ranges from the namespace qualification - for example, std::ranges::includes becomes std::includes
  • Replace every range argument with two arguments, forming an iterator pair. For example, A becomes A.begin(), A.end()

The following example shows this:

#include <algorithm>
#include <set>
#include <iostream>

int main(){
  std::set A{1, 2, 3};
  std::set B{1, 2};

  if (std::includes(A.begin(), A.end(),
                    B.begin(), B.end())) {
    std::cout << "A includes B";
  }
}
A includes B

Not that projection is part of the ranges library. As such, the iterator variations of these algorithms do not support projection.

Was this lesson useful?

Edit History

  • ‚ÄĒ First Published

Ryan McCombe
Ryan McCombe
Posted
This lesson is part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

7a.jpg
This lesson is part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Free, unlimited access!

This course includes:

  • 106 Lessons
  • 550+ Code Samples
  • 96% Positive Reviews
  • Regularly Updated
  • Help and FAQ
Next Lesson

Comparison Algorithms

An introduction to the 8 main comparison algorithms in the C++ standard library
zb721.jpg
Contact|Privacy Policy|Terms of Use
Copyright © 2023 - All Rights Reserved