正则表达式：是否有一个单线？

Question

I want to search inside multiple big text files (200MB each) as fast as possible.我想尽快在多个大文本文件（每个 200MB）中进行搜索。 I am using the command line tool ripgrep and I want to call it only once.我正在使用命令行工具ripgrep ，我只想调用它一次。

In the following string:在以下字符串中：

***foo***bar***baz***foo***bar***baz

( *** stands for a different type and number of characters.) （ ***代表不同类型和数量的字符。）

I want to match baz , but only if it follows the first occurence of foo***bar***我想匹配baz ，但baz是它遵循foo***bar***的第一次出现

So in ***foo***bar***baz***foo***bar***baz it matches the first baz and in ***foo***bar***qux***foo***bar***baz it shall match nothing.所以在***foo***bar***baz***foo***bar***baz它匹配第一个baz而在***foo***bar***qux***foo***bar***baz它不匹配任何东西。

I tried several solutions but it did not work.我尝试了几种解决方案，但没有奏效。 Can this be done with a single regular expression?这可以用单个正则表达式完成吗？

Answer 1

I'm pretty sure that a regex is overkill in this case.我很确定在这种情况下正则表达式是矫枉过正的。 A simple series of find can do the job:一系列简单的find就可以完成这项工作：

fn find_baz(input: &str) -> Option<usize> {
    const FOO: &str = "foo";
    const BAR: &str = "bar";

    // 1: we find the occurrences of "foo", "bar" and "baz":
    let foo = input.find(FOO)?;
    let bar = input[foo..].find(BAR).map(|i| i + foo)?;
    let baz = input[bar..].find("baz").map(|i| i + bar)?;

    // 2: we verify that there is no other "foo" and "bar" between:
    input[bar..baz]
        .find(FOO)
        .map(|i| i + bar)
        .and_then(|foo| input[foo..baz].find(BAR))
        .xor(Some(baz))
}

#[test]
fn found_it() {
    assert_eq!(Some(15), find_baz("***foo***bar***baz***foo***bar***baz"));
}

#[test]
fn found_it_2() {
    assert_eq!(Some(27), find_baz("***foo***bar***qux***foo***baz"));
}

#[test]
fn not_found() {
    assert_eq!(None, find_baz("***foo***bar***qux***foo***bar***baz"));
}

#[test]
fn not_found_2() {
    assert_eq!(None, find_baz("***foo***bar***qux***foo***"));
}

正则表达式：是否有一个单线？

问题描述

1 个解决方案

解决方案1
2 2019-12-04 14:54:59

正则表达式：是否有一个单线？

问题描述

1 个解决方案

解决方案1 2 2019-12-04 14:54:59

解决方案1
2 2019-12-04 14:54:59