[英]Using Regex to extract part of a string from a HTML/text file
I have a C# regular expression to match author names in a text document that is written as: 我有一个C#正则表达式来匹配文本文档中的作者姓名,该文本编写为:
"author":"AUTHOR'S NAME"
The regex is as follows: 正则表达式如下:
new Regex("\"author\":\"[A-Za-z0-9]*\\s?[A-Za-z0-9]*")
This returns "author":"AUTHOR'S NAME
. However, I don't want the quotation marks or the word Author
before. I just want the name. 这将返回
"author":"AUTHOR'S NAME
。但是,我不需要引号或单词Author
。我只想要名称。
Could anyone help me get the expected value please? 有人可以帮我得到期望的价格吗?
Use regex groups to get a part of the string. 使用正则表达式组来获取字符串的一部分。
( )
acts as a capture group and can be accessed by the .Groups
field. ( )
作为捕获组,可以通过.Groups
字段进行访问。
.Groups[0]
matches the whole string .Groups[0]
匹配整个字符串
.Groups[1]
matches the first group (and so on) .Groups[1]
匹配第一个组(依此类推)
string pattern = "\"author\":\"([A-Za-z0-9]*\\s?[A-Za-z0-9]*)\"";
var match = Regex.Match("\"author\":\"Name123\"", pattern);
string authorName = match.Groups[1];
You can also use look-around approach to only get a match value: 您还可以使用环顾四周方法仅获取匹配值:
var txt = "\"author\":\"AUTHOR'S NAME\"";
var rgx = new Regex(@"(?<=""author"":"")[^""]+(?="")");
var result = rgx.Match(txt).Value;
My regex yields 555,020 iterations per second speed with this input string, which should suffice. 我的正则表达式使用此输入字符串每秒可产生555,020次迭代,这足够了。
result
will be AUTHOR'S NAME
. result
将是“ AUTHOR'S NAME
。
(?<="author":")
checks if we have "author":"
before the match, [^"]+
looks safe since you only want to match alphanumerics and space between the quotes, and (?=")
is checking the trailing quote. (?<="author":")
检查匹配之前是否有"author":"
, [^"]+
看起来很安全,因为您只想匹配引号之间的字母数字和空格,而(?=")
为检查尾随报价。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.