Reduce memory footprint by about 600% for M.E.D. — Performance Matters

Wei Huang
Level Up Coding
Published in
6 min readJun 30, 2023

--

Weekend build and learn

In our last weekend build and learn, I share why and how I build the M.E.D. — a Rust-powered Data Masking, Encryption, and Decryption CLI tool.

As of today, there are already around 600 users who have downloaded and tried out the M.E.D. Suppose you are also interested in giving it a try. The download link is available below.

How to use it? You can read this article

The current version is 0.5.9, and its functionality based on the original design and use case is getting stable and mature. It should be close to the 1.0.0 release soon.

This weekend build and learn. I want to share one breaking change story during version 0.3.1. And the issue is as below.

I will walk you through how I identify the issues, find out where the bottleneck is, and how I fix/improve it.

Problem statement

During the performance testing, the app crashed when processing the file over 2.5 G in my OneNet Book 4. And I realized this would be a bigger problem than I thought.

Let's find out

Let's open the htop — an interactive system-monitor process-viewer and process manager, to see what the runtime looks like in a 16G RAM Macbook Pro.

When it is on the step "read" the CSV files, you can see that the memory Swp has almost reached the limit of 16G.

htop runtime screenshot — credit by author

And the memory usage in runtime is around

the top runtime screenshot — credit by author

However, when it is the "write" stage, the memory Swp back to a lower level of consumption.

So what's the "Swp"

Swap (SWP) is a particular file-backed region for that scratch memory. Creating swap space will allow the operating system to move that scratch memory to the disk instead of (utilized by more running processes) shared libraries, generally improving performance.

Basically, Swap(SWP) serves two roles:

  1. To move out less used 'pages' out of memory into storage so memory can be used more efficiently
  2. If memory is insufficient, it acts to "add" memory.

Suppose it's the #1 case; it will be ok.

In the #2 case, here are two possible scenarios.

  1. You'd have increased disk use. If your disks aren't fast enough to keep up, your system might thrash, and you'd experience slowdowns as data is swapped in and out of memory. This would result in a bottleneck, leading to the system not responding.
  2. You run out of memory, resulting in weirdness and crashes.

In our case, our program DID NOT utilize the memory efficiently, especially when the read CSV stage.

Now we know where. In the next step, let's find out why the memory gets fully Swp.

Original Design

Original UML — Application Design

Initially, I broke it down into four steps:

  1. new — initial the new processor
  2. load — load the files to the memory
  3. run — run the masking/encryption/decryption
  4. write — write back to the output location.

The issue is on #2, based on the htop analysis.

Dive into the code

async fn load(&mut self, num_workers: &u16, file_path: &str) -> Result<(), MedError> {
...
for entry in WalkDir::new(file_path)
.follow_links(true)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| !e.path().is_dir())
{
...
new_worker.pool.execute(move || {
read_csv(tx, entry.path().display().to_string()).unwrap();
});
}
...

Ok(())
}

pub fn read_csv(tx: flume::Sender<CsvFile>, path: String) -> Result<(), MedError> {
...
let mut data: Vec<StringRecord> = Vec::new();
...
reader.records().for_each(|record| {
match record {
Ok(r) => {
total_records += 1;
data.push(r);
}
Err(err) => {
...
}
};
});
...
Ok(())
}

The code translates to plain English.

  1. Loop over the files in the directory, and put them into the worker queue for execution concurrently.
  2. For each read file execution, read/push the file content(String) to the Vec for the future #3 [run] execution (Masking/Encrypt/Decrypt)

And clearly, the Vec size is depended on the file size, so the memory gets over Swp, leading to the application crash or the system not responding.

The solution

Once we identify the root cause of the memory over the Swp issue, it will be easy for me to reshape the solution. And the principal will be:

Avoid holding the struct in memory and process it once we read it.

The benefit will be avoiding large memory allocation and reducing unnecessary process steps, increasing the simplicity of code-level implementation.

updated UML — credit by author
async fn load(&mut self) -> Result<Metrics, MedError> {
...
// loop over the files path
for entry in WalkDir::new(&self.runtime_params.file_path)
.follow_links(true)
.into_iter()
.filter_entry(is_not_hidden)
.filter_map(|e| e.ok())
.filter(|e| !e.path().is_dir())
{
...
match self.runtime_params.file_type {
FileType::CSV => {
new_worker.pool.execute(move || {
csv_processor(tx_metadata, &files_path, &output_dir, process_runtime)
.unwrap();
});
}
FileType::JSON => {
new_worker.pool.execute(move || {
json_processor(tx_metadata, &files_path, &output_dir, process_runtime)
.unwrap();
});
}
}
}
...
rx_metadata.iter().for_each(|item| {
/// update the metrics
});
Ok(self.metrics.clone())
}

pub fn csv_processor(
tx_metadata: flume::Sender<Metadata>,
files_path: &str,
output_path: &str,
process_runtime: ProcessRuntime,
) -> Result<(), MedError> {
...
reader.into_records().for_each(|record| {
match record {
Ok(records) => {
/// 1. Read by record
/// 2. Masking, Encryption, and Decryption
/// 3. Write
}
Err(err) => {
...
}
};
});
...
tx_metadata
.send(Metadata {
total_records,
failed_records,
record_failed_reason,
})
.unwrap();
Ok(())
}

Output

Testing in the new implementation, the Memory Swap (SWP) remains in the health stage under 5G RAM.

And the runtime memory usage is only around ~ 2000K (max 3 M.B.).

Top runtime screenshot — credit by author

Compared with before — the runtime memory footprint drop from ~ 20G to 3 MB, that's almost a 666% reduction, which provides excellent capabilities running in the smaller size infra.

Just imagine the cost of VM if this is running in the cloud ENV, which demonstrates again — Performance Matters!

This version of the fixed has been released on the 0.5.2 version.

Also, I have done some benchmarking base on the file size.

Model Name: MacBook Pro
Processor Name: 6-Core Intel Core i7
Processor Speed: 2.6 GHz
Total Number of Cores: 6
Memory: 16 GB
Performance test capture table — credit by author

Thank you for your reading and support. The Weekend build and learn, carry on.

Please follow me on Medium if you like the Weekend Build and Learn.

--

--