繁体   English   中英

正则表达式在字符串中查找所有出现的模式

[英]Regex find all occurrences of a pattern in a string

我在查找字符串中所有模式的出现时遇到问题。

检查这个字符串:

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";

我想返回2次出现(以便以后解码):

=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=

=?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?="

使用以下正则表达式代码,它只返回1次出现:完整字符串。

var charSetOccurences = new Regex(@"=\?.*\?B\?.*\?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(input);
foreach (Match match in charSetMatches)
{
    charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
}

你知道我错过了什么吗?

regexp解析器看到.*字符序列时,它会匹配字符串末尾的所有内容并返回char,char by char(greedy match)。 因此,为避免此问题,您可以使用非贪婪匹配或明确定义可出现在字符串中的字符。

"=\?[a-zA-Z0-9?=-]*\?B\?[a-zA-Z0-9?=-]*\?="

一种非正则表达方式:

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
string[] charSetOccurences = msg.Split(new string[]{ " " }, StringSplitOptions.None);
foreach (string s in charSetOccurences)
{
    string charSet = s.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
    Console.WriteLine(charSet);
}

看到一个想法

如果你仍然想使用正则表达式,你应该通过添加一个?来实现.* lazy ? 以前的用户已经提到过这一点,但似乎你没有得到匹配?

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"=\?.*?\?B\?.*?\?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(msg);
foreach (Match match in charSetMatches)
{
    string charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
    Console.WriteLine(charSet);
}

看到另一个想法

两种情况下的输出相同:

windows-1258UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=
windows-1258IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=

编辑:根据更新,请查看针对您的问题的一体化解决方案

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"=\?.*?\?[BQ]\?.*?\?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
    string[] encoding = match.Groups[0].Value.Split(new string[]{ "?" }, StringSplitOptions.None);
    string charSet = encoding[1];
    string encodeType = encoding[2];
    string encodedString = encoding[3];
    Console.WriteLine("Charset: " + charSet);
    Console.WriteLine("Encoding type: " + encodeType);
    Console.WriteLine("Encoded String: " + encodedString + "\n");
}

返回:

Charset: windows-1258
Encoding type: B
Encoded String: UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz

Charset: windows-1258
Encoding type: B
Encoded String: IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=

看到这个

或者因为我们已经有了正则表达式,我们可以使用:

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"=\?(.*?)\?([BQ])\?(.*?)\?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
    Console.WriteLine("Charset: " + match.Groups[1].Value);
    Console.WriteLine("Encoding type: " + match.Groups[2].Value);
    Console.WriteLine("Encoded String: " + match.Groups[3].Value + "\n");
}

返回相同的输出

.*是贪婪的,会与第一个相匹配? 到最后?B?

你需要使用非贪婪的比赛

=\?.*?\?B\?.*?\?=

还是排除? 从你的角色列表

=\?[^?]*\?B\?[^?]*\?=

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM