Use Directory Iterator with Multithreading
How can I combine directory_iterator
with multithreading?
Combining std::filesystem::directory_iterator
with multithreading can help improve performance, especially when processing large directories. The basic idea is to divide the work among multiple threads.
Thread Pool Example:
Here's an example using a simple thread pool to process files in parallel:
#include <filesystem>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>
#include <vector>
namespace fs = std::filesystem;
std::mutex output_mutex;
std::mutex queue_mutex;
std::queue<fs::directory_entry> file_queue;
void process_file() {
while (true) {
fs::directory_entry entry;
{
std::lock_guard<std::mutex> lock(queue_mutex);
if (file_queue.empty()) {
break; Exit if queue is empty
}
entry = file_queue.front();
file_queue.pop();
}
std::lock_guard<std::mutex> lock(output_mutex);
std::cout << entry.path().string() << '\n';
}
}
int main() {
fs::directory_iterator start{R"(c:\test)"};
fs::directory_iterator end{};
// Populate the queue with files
{
std::lock_guard<std::mutex> lock(queue_mutex);
for (auto iter{start}; iter != end; ++iter) {
file_queue.push(*iter);
}
}
std::vector<std::thread> threads;
int num_threads = 4; Number of threads
for (int i = 0; i < num_threads; ++i) {
threads.emplace_back(process_file);
}
for (auto& t : threads) {
t.join();
}
}
c:\test\file1.txt
c:\test\file2.txt
c:\test\file3.txt
Key Points:
- Thread Safety: Use a mutex (
std::mutex
) to protect shared resources like console output. - Dividing Work: Divide the directory entries among multiple threads to process files in parallel.
- Joining Threads: Ensure all threads complete their work by joining them.
Using a Concurrent Queue:
For more complex tasks, consider using a concurrent queue to distribute work:
#include <condition_variable>
#include <filesystem>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>
namespace fs = std::filesystem;
std::mutex queue_mutex;
std::condition_variable cv;
std::queue<fs::directory_entry> file_queue;
void process_file() {
while (true) {
std::unique_lock<std::mutex> lock(queue_mutex);
cv.wait(lock, [] { return !file_queue.empty(); });
auto entry = file_queue.front();
file_queue.pop();
lock.unlock();
// Empty path signals end
if (entry.path().empty()) break;
std::cout << entry.path().string() << '\n';
}
}
int main() {
fs::directory_iterator start{R"(c:\test)"};
fs::directory_iterator end{};
std::vector<std::thread> threads;
int num_threads = 4;
for (int i = 0; i < num_threads; ++i) {
threads.emplace_back(process_file);
}
for (auto iter{start}; iter != end; ++iter) {
if (iter->is_regular_file()) {
std::unique_lock<std::mutex> lock(queue_mutex);
file_queue.push(*iter);
lock.unlock();
cv.notify_one();
}
}
for (int i = 0; i < num_threads; ++i) {
std::unique_lock<std::mutex> lock(queue_mutex);
// Signal end
file_queue.push(fs::directory_entry{});
lock.unlock();
cv.notify_one();
}
for (auto& t : threads) {
t.join();
}
}
c:\test\file1.txt
c:\test\file2.txt
c:\test\file3.txt
Combining std::filesystem::directory_iterator
with multithreading can significantly improve performance by leveraging concurrent processing. Using thread pools or concurrent queues ensures efficient and safe multithreaded operations.
Directory Iterators
An introduction to iterating through the file system, using directory_iterator
and recursive_directory_iterator
.