简体   繁体   中英

Rust string comparison same speed as Python . Want to parallelize the program

I am new to rust. I want to write a function which later can be imported into Python as a module using the pyo3 crate.

Below is the Python implementation of the function I want to implement in Rust:

def pcompare(a, b):
    letters = []
    for i, letter in enumerate(a):
        if letter != b[i]:
            letters.append(f'{letter}{i + 1}{b[i]}')
    return letters

The first Rust implemention I wrote looks like this:

use pyo3::prelude::*;


#[pyfunction]
fn compare_strings_to_vec(a: &str, b: &str) -> PyResult<Vec<String>> {

    if a.len() != b.len() {
        panic!(
            "Reads are not the same length! 
            First string is length {} and second string is length {}.",
            a.len(), b.len());
    }

    let a_vec: Vec<char> = a.chars().collect();
    let b_vec: Vec<char> = b.chars().collect();

    let mut mismatched_chars = Vec::new();

    for (mut index,(i,j)) in a_vec.iter().zip(b_vec.iter()).enumerate() {
        if i != j {
            index += 1;
            let mutation = format!("{i}{index}{j}");
            mismatched_chars.push(mutation);
        } 

    }
    Ok(mismatched_chars)
}


#[pymodule]
fn compare_strings(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(compare_strings_to_vec, m)?)?;
    Ok(())
}

Which I builded in --release mode. The module could be imported to Python, but the performance was quite similar to the performance of the Python implementation.

My first question is: Why is the Python and Rust function similar in speed?

Now I am working on a parallelization implementation in Rust. When just printing the result variable, the function works :

use rayon::prelude::*;

fn main() {
    
    let a: Vec<char> = String::from("aaaa").chars().collect();
    let b: Vec<char> = String::from("aaab").chars().collect();
    let length = a.len();
    let index: Vec<_> = (1..=length).collect();
    
    let mut mismatched_chars: Vec<String> = Vec::new();
    
    (a, index, b).into_par_iter().for_each(|(x, i, y)| {
        if x != y {
            let mutation = format!("{}{}{}", x, i, y).to_string();
            println!("{mutation}");
            //mismatched_chars.push(mutation);
        }
    });
    
}

However, when I try to push the mutation variable to the mismatched_chars vector:

use rayon::prelude::*;

fn main() {
    
    let a: Vec<char> = String::from("aaaa").chars().collect();
    let b: Vec<char> = String::from("aaab").chars().collect();
    let length = a.len();
    let index: Vec<_> = (1..=length).collect();
    
    let mut mismatched_chars: Vec<String> = Vec::new();
    
    (a, index, b).into_par_iter().for_each(|(x, i, y)| {
        if x != y {
            let mutation = format!("{}{}{}", x, i, y).to_string();
            //println!("{mutation}");
            mismatched_chars.push(mutation);
        }
    });
    
}

I get the following error:

error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
  --> src/main.rs:16:13
   |
16 |             mismatched_chars.push(mutation);
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable

For more information about this error, try `rustc --explain E0596`.
error: could not compile `testing_compare_strings` due to previous error

I tried A LOT of different things. When I do:

use rayon::prelude::*;

fn main() {
    
    let a: Vec<char> = String::from("aaaa").chars().collect();
    let b: Vec<char> = String::from("aaab").chars().collect();
    let length = a.len();
    let index: Vec<_> = (1..=length).collect();
    
    let mut mismatched_chars: Vec<&str> = Vec::new();
    
    (a, index, b).into_par_iter().for_each(|(x, i, y)| {
        if x != y {
            let mutation = format!("{}{}{}", x, i, y).to_string();
            mismatched_chars.push(&mutation);
        }
    });
    
}

The error becomes:

error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
  --> src/main.rs:16:13
   |
16 |             mismatched_chars.push(&mutation);
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable

error[E0597]: `mutation` does not live long enough
  --> src/main.rs:16:35
   |
10 |     let mut mismatched_chars: Vec<&str> = Vec::new();
   |         -------------------- lifetime `'1` appears in the type of `mismatched_chars`
...
16 |             mismatched_chars.push(&mutation);
   |             ----------------------^^^^^^^^^-
   |             |                     |
   |             |                     borrowed value does not live long enough
   |             argument requires that `mutation` is borrowed for `'1`
17 |         }
   |         - `mutation` dropped here while still borrowed

I suspect that the solution is quite simple, but I cannot see it myself.

You have the right idea with what you are doing, but you will want to try to use an iterator chain with filter and map to remove or convert iterator items into different values. Rayon also provides a collect method similar to regular iterators to convert items into a type T: FromIterator (such as Vec<T> ).

fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
    // Same as with the if statement, but just a little shorter to write
    // Plus, it will print out the two values it is comparing if it errors.
    assert_eq!(a.len(), b.len(), "Reads are not the same length!");
    
    // Zip the character iterators from a and b together
    a.chars().zip(b.chars())
        // Iterate with the index of each item
        .enumerate()
        // Rayon function which turns a regular iterator into a parallel one 
        .par_bridge()
        // Filter out values where the characters are the same
        .filter(|(_, (a, b))| a != b)
        // Convert the remaining values into an error string
        .map(|(index, (a, b))| {
            format!("{}{}{}", a, index + 1, b)
        })
        // Turn the items of this iterator into a Vec (Or any other FromIterator type).
        .collect()
}

Rust Playground

You cannot directly access the field mismatched_chars in a multithreading environment.

You can use Arc<RwLock> to access the field in multithreading.

use rayon::prelude::*;
use std::sync::{Arc, RwLock};

fn main() {
    let a: Vec<char> = String::from("aaaa").chars().collect();
    let b: Vec<char> = String::from("aaab").chars().collect();
    let length = a.len();
    let index: Vec<_> = (1..=length).collect();

    let mismatched_chars: Arc<RwLock<Vec<String>>> = Arc::new(RwLock::new(Vec::new()));

    (a, index, b).into_par_iter().for_each(|(x, i, y)| {
        if x != y {
            let mutation = format!("{}{}{}", x, i, y);
            mismatched_chars
                .write()
                .expect("could not acquire write lock")
                .push(mutation);
        }
    });

    for mismatch in mismatched_chars
        .read()
        .expect("could not acquire read lock")
        .iter()
    {
        eprintln!("{}", mismatch);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM