简体   繁体   English

如何使用正则表达式匹配引号中的字符串

[英]How to match string in quotes using Regex

Suppose I have the following text in a text file 假设我在文本文件中有以下文本

First Text 第一文

"Some Text" “一些文字”

"124arandom txt that should not be parsed!@ “124arandom txt,不应该被解析!@

"124 Some Text" “124 Some Text”

"어떤 글" “어떤글”

this text a"s well should not be parsed 这篇文章不应该被解析

I would like to retrieve Some Text , 124 Some Text and 어떤 글 as matched strings. 我想检索Some Text124 Some Text어떤 글作为匹配的字符串。 The text is read line by line. 文本逐行读取。 Catch is, it has to match foreign languages as well if it is inside quotes. Catch是,它必须匹配外语,如果它在引号内。

Update: I found out something weird. 更新:我发现了一些奇怪的东西。 I was trying some random stuff and found out that: 我正在尝试一些随机的东西并发现:

string s = "어떤 글"
Regex regex = new Regex("[^\"]*");
MatchCollection matches = regex.Matches(s);

matches have a count = 10 and have generated some empty items inside (The parsed text is in index 2). 匹配的count = 10并且在其中生成了一些空项(解析后的文本在索引2中)。 This might've been why I kept getting empty string when I was just doing Regex.Replace. 当我正在做Regex.Replace时,这可能就是为什么我一直得到空字符串的原因。 Why is this happening? 为什么会这样?

If you read the text line by line, then the regex 如果你逐行阅读文本,那么正则表达式

"[^"]*"

will find all quoted strings, unless those may contain escaped quotes like "a 2\\" by 4\\" board" . 将找到所有引用的字符串,除非这些字符串可能包含"a 2\\" by 4\\" board"类的"a 2\\" by 4\\" board"类的转义引号。

To match those correctly, you need 要正确匹配这些,您需要

"(?:\\.|[^"\\])*"

If you don't want the quotes to become part of the match, use lookaround assertions : 如果您不希望引号成为匹配项的一部分,请使用外观断言

(?<=")[^"]*(?=")
(?<=")(?:\\.|[^"\\])*(?=")

These regexes, as C# regexes, can be created like this: 这些正则表达式,如C#正则表达式,可以像这样创建:

Regex regex1 = new Regex(@"(?<="")[^\""]*(?="")");
Regex regex2 = new Regex(@"(?<="")(?:\\.|[^""\\])*(?="")");

. You can use a regular expression and then try to match it with any text you want. 您可以使用正则表达式,然后尝试将其与您想要的任何文本进行匹配。 can be in a loop or what ever you need. 可以循环或任何你需要的。

string str = "\"your text\"";
//check for at least on char inside the qoutes
Regex r = new Regex("\".+\"");
bool ismatch = r.IsMatch(str); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM