使用Regex从HTML /文本文件中提取字符串的一部分

Question

I have a C# regular expression to match author names in a text document that is written as: 我有一个C＃正则表达式来匹配文本文档中的作者姓名，该文本编写为：

"author":"AUTHOR'S NAME"

The regex is as follows: 正则表达式如下：

new Regex("\"author\":\"[A-Za-z0-9]*\\s?[A-Za-z0-9]*")

This returns "author":"AUTHOR'S NAME . However, I don't want the quotation marks or the word Author before. I just want the name. 这将返回"author":"AUTHOR'S NAME 。但是，我不需要引号或单词Author 。我只想要名称。

Could anyone help me get the expected value please? 有人可以帮我得到期望的价格吗？

Answer 1

Use regex groups to get a part of the string. 使用正则表达式组来获取字符串的一部分。 ( ) acts as a capture group and can be accessed by the .Groups field. ( )作为捕获组，可以通过.Groups字段进行访问。

.Groups[0] matches the whole string .Groups[0]匹配整个字符串

.Groups[1] matches the first group (and so on) .Groups[1]匹配第一个组（依此类推）

string pattern = "\"author\":\"([A-Za-z0-9]*\\s?[A-Za-z0-9]*)\"";
var match = Regex.Match("\"author\":\"Name123\"", pattern);
string authorName = match.Groups[1];

Answer 2

You can also use look-around approach to only get a match value: 您还可以使用环顾四周方法仅获取匹配值：

var txt = "\"author\":\"AUTHOR'S NAME\"";
var rgx = new Regex(@"(?<=""author"":"")[^""]+(?="")");
var result = rgx.Match(txt).Value;

My regex yields 555,020 iterations per second speed with this input string, which should suffice. 我的正则表达式使用此输入字符串每秒可产生555,020次迭代，这足够了。

result will be AUTHOR'S NAME . result将是“ AUTHOR'S NAME 。

(?<="author":") checks if we have "author":" before the match, [^"]+ looks safe since you only want to match alphanumerics and space between the quotes, and (?=") is checking the trailing quote. (?<="author":")检查匹配之前是否有"author":" ， [^"]+看起来很安全，因为您只想匹配引号之间的字母数字和空格，而(?=")为检查尾随报价。

使用Regex从HTML /文本文件中提取字符串的一部分

问题描述

2 个解决方案

解决方案1
3 2015-05-20 08:54:35

解决方案2
0 2015-05-20 08:59:30

使用Regex从HTML /文本文件中提取字符串的一部分

问题描述

2 个解决方案

解决方案1 3 2015-05-20 08:54:35

解决方案2 0 2015-05-20 08:59:30

解决方案1
3 2015-05-20 08:54:35

解决方案2
0 2015-05-20 08:59:30