Use Directory Iterator with Multithreading

How can I combine directory_iterator with multithreading?

Combining std::filesystem::directory_iterator with multithreading can help improve performance, especially when processing large directories. The basic idea is to divide the work among multiple threads.

Thread Pool Example:

Here's an example using a simple thread pool to process files in parallel:

#include <filesystem>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>
#include <vector>
namespace fs = std::filesystem;

std::mutex output_mutex;
std::mutex queue_mutex;
std::queue<fs::directory_entry> file_queue;

void process_file() {
  while (true) {
    fs::directory_entry entry;
    {
      std::lock_guard<std::mutex> lock(queue_mutex);
      if (file_queue.empty()) {
        break;   Exit if queue is empty
      }
      entry = file_queue.front();
      file_queue.pop();
    }

    std::lock_guard<std::mutex> lock(output_mutex);
    std::cout << entry.path().string() << '\n';
  }
}

int main() {
  fs::directory_iterator start{R"(c:\test)"};
  fs::directory_iterator end{};

  // Populate the queue with files
  {
    std::lock_guard<std::mutex> lock(queue_mutex);
    for (auto iter{start}; iter != end; ++iter) {
      file_queue.push(*iter);
    }
  }

  std::vector<std::thread> threads;
  int num_threads = 4;   Number of threads

  for (int i = 0; i < num_threads; ++i) {
    threads.emplace_back(process_file);
  }

  for (auto& t : threads) {
    t.join();
  }
}
c:\test\file1.txt
c:\test\file2.txt
c:\test\file3.txt

Key Points:

  • Thread Safety: Use a mutex (std::mutex) to protect shared resources like console output.
  • Dividing Work: Divide the directory entries among multiple threads to process files in parallel.
  • Joining Threads: Ensure all threads complete their work by joining them.

Using a Concurrent Queue:

For more complex tasks, consider using a concurrent queue to distribute work:

#include <condition_variable>
#include <filesystem>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>

namespace fs = std::filesystem;

std::mutex queue_mutex;
std::condition_variable cv;
std::queue<fs::directory_entry> file_queue;

void process_file() {
  while (true) {
    std::unique_lock<std::mutex> lock(queue_mutex);
    cv.wait(lock, [] { return !file_queue.empty(); });

    auto entry = file_queue.front();
    file_queue.pop();
    lock.unlock();

    // Empty path signals end
    if (entry.path().empty()) break;  

    std::cout << entry.path().string() << '\n';
  }
}

int main() {
  fs::directory_iterator start{R"(c:\test)"};
  fs::directory_iterator end{};

  std::vector<std::thread> threads;
  int num_threads = 4;

  for (int i = 0; i < num_threads; ++i) {
    threads.emplace_back(process_file);
  }

  for (auto iter{start}; iter != end; ++iter) {
    if (iter->is_regular_file()) {
      std::unique_lock<std::mutex> lock(queue_mutex);
      file_queue.push(*iter);
      lock.unlock();
      cv.notify_one();
    }
  }

  for (int i = 0; i < num_threads; ++i) {
    std::unique_lock<std::mutex> lock(queue_mutex);
    // Signal end
    file_queue.push(fs::directory_entry{});  
    lock.unlock();
    cv.notify_one();
  }

  for (auto& t : threads) {
    t.join();
  }
}
c:\test\file1.txt
c:\test\file2.txt
c:\test\file3.txt

Combining std::filesystem::directory_iterator with multithreading can significantly improve performance by leveraging concurrent processing. Using thread pools or concurrent queues ensures efficient and safe multithreaded operations.

Directory Iterators

An introduction to iterating through the file system, using directory_iterator and recursive_directory_iterator.

Questions & Answers

Answers are generated by AI models and may not have been reviewed. Be mindful when running any code on your device.

Network Paths with Directory Iterators
Can directory_iterator be used with network paths?
Filter Directory Entries
How do I filter the directory entries to only show files?
Handling Missing Directories with directory_iterator
What happens if the directory path does not exist when creating a directory_iterator?
Skip Files or Directories using directory_iterator
How can I skip certain files or directories during iteration?
Sort Directory Entries
Is it possible to sort the directory entries while iterating?
Handle Symbolic Links During Directory Iteration
How do I handle symbolic links when using directory_iterator?
Get File Attributes During Directory Iteration
Can I use directory_iterator to get file attributes?
Stop Directory Iteration Early
How can I stop the iteration prematurely when using directory_iterator?
Count Files in Directory
How can I count the number of files in a directory?
Use Relative Paths with Directory Iterator
Can directory_iterator be used with relative paths?
Iterate Multiple Directories
How do I iterate over multiple directories in one loop?
Or Ask your Own Question
Get an immediate answer to your specific question using our AI assistant