简体   繁体   English

非贪婪正则表达式仅匹配1个字符

[英]Non-greedy regex matches just 1 character

A have a list of files, some of whose names are suffixed with a .cloud. 包含一个文件列表,其中一些名称后缀.cloud。 How do I write a regular expression that gets the filename without the .cloud part? 如何编写不带.cloud部分的文件名的正则表达式?

Here's a sample perl script I tried. 这是我尝试过的一个示例perl脚本。

#! /usr/bin/perl -w

my @log_files = ('infolog.txt', 'errorlog.txt.cloud', 'dailyerrorlog.txt.cloud', 'trace.output.cloud', 'debug.log.cloud');

foreach my $file (@log_files)
{
    print $1."\n" if($file =~ /(.+?)(?:\.cloud)?/);
}

This prints the following: 打印以下内容:

$ perl test.pl 
i
e
d
t
d

If I get rid of the '?' 如果我摆脱了“?” that makes the .+ greedy, it matches everything, including .cloud. 使。+变得贪婪,它匹配所有内容,包括.cloud。

#! /usr/bin/perl -w

my @log_files = ('infolog.txt', 'errorlog.txt.cloud', 'dailyerrorlog.txt.cloud', 'trace.output.cloud', 'debug.log.cloud');

foreach my $file (@log_files)
{
    print $1."\n" if($file =~ /(.+)(?:\.cloud)?/);
}

This prints the following: 打印以下内容:

$ perl test.pl 
infolog.txt
errorlog.txt.cloud
dailyerrorlog.txt.cloud
trace.output.cloud
debug.log.cloud

What I really want is a regular expression that'll print: 我真正想要的是一个可以打印的正则表达式:

$ perl test.pl 
infolog.txt
errorlog.txt
dailyerrorlog.txt
trace.output
debug.log

I've modified my real use case to a very simple example here. 我已经将我的实际用例修改为一个非常简单的示例。 I need to use regular expressions here to match the filename, so answers like 我需要在此处使用正则表达式来匹配文件名,因此答案类似于

$file =~ s/\.cloud$//;
print $file."\n";

will not work for me. 不会为我工作。

I've tried a similar thing in C# too, with similar results. 我也曾在C#中尝试过类似的事情,但结果相似。

    static void Main(string[] args)
    {
        Regex regex = new Regex(@"(?<filename>.+?)(?:\.cloud)?");
        string text = "abcdef.txt.cloud";
        Match match = regex.Match(text);
        if(match.Success)
        {
            Console.WriteLine("Found filename: {0}", match.Groups["filename"].Value);
        }
    }
    // Output
    // Found filename: a

Thanks for any help. 谢谢你的帮助。

It's often easier to read/maintain regular expressions if you specify that the entire string must match. 如果您指定整个字符串必须匹配,则通常更容易阅读/维护正则表达式。 That's easy enough to do with ^ and $ , which match the start and end of the string. 使用^$匹配字符串的开头和结尾很容易。

Matching anywhere in the string: (.+?)(?:\\.cloud)? 匹配字符串中的任何地方: (.+?)(?:\\.cloud)?

Matching the entire string: ^(.+?)(?:\\.cloud)?$ 匹配整个字符串: ^(.+?)(?:\\.cloud)?$

In the second case, the non-greedy group will capture as little as possible, but will need to capture multiple characters to satisfy the match condition. 在第二种情况下,非贪婪的组将捕获尽可能少的字符,但需要捕获多个字符才能满足匹配条件。

This doesn't cover every possible use case, but it tends to result in a regex that's easier to read six months from now. 这并不能涵盖所有可能的用例,但是它倾向于导致一个正则表达式,从现在开始六个月后更容易阅读。

It only matches one character because you told it to match the least possible number of characters, and .+ isn't allowed to match zero characters. 它只匹配一个字符,因为您告诉它要匹配的字符数最少,而且.+不允许匹配零个字符。


I'm going to use $PAT instead of .+ since you said it's a stand-in for something more complicated. 我将使用$PAT代替.+因为您说它是更复杂事物的替代品。

Despite your claims that s/// can't be used, it still seems to be the simplest solution to me. 尽管您声称s///不能使用,但它似乎仍然是我最简单的解决方案。

my ($match) = map { s/\.cloud\z//r } $file =~ /^($PAT)\z/;  # 5.14+

or 要么

my ($match) = map { ( my $s = $_ ) =~ s/\.cloud\z//; $s } $file =~ /^($PAT)\z/;

That said, it can also be achieved using a match: 也就是说,也可以使用匹配项来实现:

my $match = $file =~ /^(?:($PAT)(?=\.cloud\z)|($PAT))/ ? ($1 // $2) : undef;

By the way, if $PAT was .+ , and I wanted to use a match, I'd use the following: 顺便说一句,如果$PAT.+ ,并且我想使用匹配项,则可以使用以下内容:

my ($match) = $file =~ /^((?:(?!\.cloud\z).)+)/s;

But it would be far simpler to use 但是使用起来会容易得多

my $match = $file =~ s/\.cloud\z//r;   # 5.14+

or 要么

(my $match = $file) =~ s/\.cloud\z//;

The reason your pattern is matching only a single character is that the sub-pattern (?:\\.cloud)? 您的模式仅匹配单个字符的原因是子模式(?:\\.cloud)? is optional, so it can be satisfied by nothing at all. 是可选的,因此一无所获。 That leaves (.+?) free to match the shortest string allowable by the + quantifier, which is one character 使(.+?)自由匹配+量词允许的最短字符串,即一个字符

It's easy to fix this by just anchoring the end of the pattern so that it has to match the whole of the string 仅锚定模式的末尾以使其必须与整个字符串匹配即可解决此问题

This code works fine 该代码可以正常工作

use strict;
use warnings 'all';

my @log_files = qw/
    infolog.txt
    errorlog.txt.cloud
    dailyerrorlog.txt.cloud
    trace.output.cloud
    debug.log.cloud
/;

for ( @log_files ) {
    print "$1\n" if /(.+?)(?:\.cloud)?$/;
}

output 输出

infolog.txt
errorlog.txt
dailyerrorlog.txt
trace.output
debug.log

Assign the file name to a temporary variable and modify it. 将文件名分配给一个临时变量并对其进行修改。

my @log_files = qw(
    infolog.txt
    errorlog.txt.cloud
    dailyerrorlog.txt.cloud
    trace.output.cloud
    debug.log.cloud
);

foreach my $file (@log_files)
{
    my $tmpry = $file;
    $tmpry =~ s/\.cloud$//;
    printf "%-25s %s\n", $file, $tmpry;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM