简体   繁体   English

有没有办法在 Rust 中更新字符串?

[英]Is there a way to update a string in place in rust?

You can also consider this as, is it possible to URLify a string in place in rust?您也可以将其视为,是否可以在 rust 中对字符串进行 URL 化?

For example,例如,


Problem statement: Replace whitespace with %20
Assumption: String will have enough capacity left to accommodate new characters.

Input: Hello how are you

Output: Hello%20how%20are%20you

I know there are ways to do this if we don't have to do this "in place".我知道如果我们不必“就地”这样做,有办法做到这一点。 I am solving a problem that explicitly states that you have to update in place.我正在解决一个明确指出您必须就地更新的问题。

If there isn't any safe way to do this, is there any particular reason behind that?如果没有任何安全的方法可以做到这一点,背后是否有任何特殊原因?

[Edit] [编辑]

I was able to solve this using unsafe approach, but would appreciate a better approach than this.我能够使用unsafe方法解决这个问题,但希望有比这更好的方法。 More idiomatic approach if there is.如果有的话,更惯用的方法。

fn space_20(sentence: &mut String) {
  if !sentence.is_ascii() {
    panic!("Invalid string");
  }

  let chars: Vec<usize> = sentence.char_indices().filter(|(_, ch)| ch.is_whitespace()).map(|(idx, _)| idx ).collect();
  let char_count = chars.len();
  if char_count == 0 {
    return;
  }

  let sentence_len = sentence.len();
  sentence.push_str(&"*".repeat(char_count*2)); // filling string with * so that bytes array becomes of required size.

  unsafe {
    let bytes = sentence.as_bytes_mut();

    let mut final_idx = sentence_len + (char_count * 2) - 1;

    let mut i = sentence_len - 1;
    let mut char_ptr = char_count - 1;
    loop {
      if i != chars[char_ptr] {
        bytes[final_idx] = bytes[i];
        if final_idx == 0 {
          // all elements are filled.
          println!("all elements are filled.");
          break;
        }
        final_idx -= 1;

      } else {
        bytes[final_idx] = '0' as u8;
        bytes[final_idx - 1] = '2' as u8;
        bytes[final_idx - 2] = '%' as u8;

        // final_idx is of type usize cannot be less than 0.
        if final_idx < 3 {
          println!("all elements are filled at start.");
          break;
        }

        final_idx -= 3;

        // char_ptr is of type usize cannot be less than 0.
        if char_ptr > 0 {
          char_ptr -= 1;
        }
      }

      if i == 0 {
        // all elements are parsed.
        println!("all elements are parsed.");
        break;
      }

      i -= 1;
    }

  }

}


fn main() {
  let mut sentence = String::with_capacity(1000);
  sentence.push_str("  hello, how are you?");
  // sentence.push_str("hello, how  are you?");
  // sentence.push_str(" hello, how are you? ");
  // sentence.push_str("  ");
  // sentence.push_str("abcd");

  space_20(&mut sentence);
  println!("{}", sentence);
}

An O ( n ) solution that neither uses unsafe nor allocates (provided that the string has enough capacity), using std::mem::take :使用std::mem::takeO ( n ) 解决方案,既不使用unsafe也不分配(前提是字符串具有足够的容量):

fn urlify_spaces(text: &mut String) {
    const SPACE_REPLACEMENT: &[u8] = b"%20";

    // operating on bytes for simplicity
    let mut buffer = std::mem::take(text).into_bytes();
    let old_len = buffer.len();

    let space_count = buffer.iter().filter(|&&byte| byte == b' ').count();
    let new_len = buffer.len() + (SPACE_REPLACEMENT.len() - 1) * space_count;
    buffer.resize(new_len, b'\0');

    let mut write_pos = new_len;

    for read_pos in (0..old_len).rev() {
        let byte = buffer[read_pos];

        if byte == b' ' {
            write_pos -= SPACE_REPLACEMENT.len();
            buffer[write_pos..write_pos + SPACE_REPLACEMENT.len()]
                .copy_from_slice(SPACE_REPLACEMENT);
        } else {
            write_pos -= 1;
            buffer[write_pos] = byte;
        }
    }

    *text = String::from_utf8(buffer).expect("invalid UTF-8 during URL-ification");
}

( playground ) 操场

Basically, it calculates the final length of the string, sets up a reading pointer and a writing pointer, and translates the string from right to left.基本上,它计算字符串的最终长度,设置一个读指针和一个写指针,并将字符串从右向左翻译。 Since "%20" has more characters than " " , the writing pointer never catches up with the reading pointer.由于"%20"字符多于" " ,所以写指针永远赶不上读指针。

Is it possible to do this without unsafe ?是否有可能在没有unsafe情况下做到这一点?

Yes like this:是这样的:

fn main() {
    let mut my_string = String::from("Hello how are you");
    
    let mut insert_positions = Vec::new();
    
    let mut char_counter = 0;
    for c in my_string.chars() {
        if c == ' ' {
            insert_positions.push(char_counter);
            char_counter += 2; // Because we will insert two extra chars here later.
        }
        char_counter += 1;
    }
    
    for p in insert_positions.iter() {
        my_string.remove(*p);
        my_string.insert(*p, '0');
        my_string.insert(*p, '2');
        my_string.insert(*p, '%');
    }
    println!("{}", my_string);
}

Here is the Playground .这里是游乐场

But should you do it?但是你应该这样做吗?

As discussed for example here on Reddit this is almost always not the recommended way of doing this, because both remove and insert are O(n) operations as noted in the documentation.正如Reddit 上的示例所讨论的,这几乎总是不推荐的这样做的方法,因为removeinsert都是O(n)操作,如文档中所述。

Edit编辑

A slightly better version:稍微好一点的版本:

fn main() {
    let mut my_string = String::from("Hello how are you");
    
    let mut insert_positions = Vec::new();
    
    let mut char_counter = 0;
    for c in my_string.chars() {
        if c == ' ' {
            insert_positions.push(char_counter);
            char_counter += 2; // Because we will insert two extra chars here later.
        }
        char_counter += 1;
    }
    
    for p in insert_positions.iter() {
        my_string.remove(*p);
        my_string.insert_str(*p, "%20");
    }
    println!("{}", my_string);
}

and the corresponding Playground .和相应的Playground

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM