[英]How can i parse specific string using indexof and substring?

int firstTag = source.IndexOf("data-token=");
int lastTag = source.IndexOf("\"href", firstTag);
int startIndex = firstTag + 12;
int endIndex = lastTag + 5;
string authenticityToken = source.Substring(startIndex, endIndex - startIndex);

The string I want to parse is from here: 我要解析的字符串是从这里:

<a class="bizLink" data-token="-iUzEhgdscgbpj5VMi5zoh54FTeFt8M4mj5nsiodxR5VzZOhniodpj6nFQg0nce3MhUxFSgdxjM4J

I want to get only the string between " and " only this: 我只想获取“和”之间的字符串:


But what I get with my code is this long string I wanted, but also all the rest of the file text. 但是我得到的代码是我想要的这个长字符串,还有所有其余的文件文本。

The sane way would be to use a HTML parser and querying library. 理智的方法是使用HTML解析器和查询库。 I can suggest CsQuery , which is a jQuery-like library in .NET. 我可以建议使用CsQuery ,它是.NET中类似jQuery的库。 You could use a selector like a[data-token] to match your anchor, then extract the attribute value. 您可以使用类似a[data-token]的选择器来匹配锚,然后提取属性值。

This is the correct way of doing things. 这是正确的做事方式。

But if you only ever want to get this one attribute and don't do anything with the HTML source ever again, it might be easier to just use a regex, but beware: parsing HTML with regex is evil . 但是,如果您只想得到这一个属性,不要再碍着与HTML源东西,它可能会更容易,只需使用正则表达式,但要注意: 解析HTML与正则表达式是邪恶的

So if all you want to do is just extract this one piece of information, as an exceptional measure, for your information, you could use that: 因此,如果您要做的只是提取一条信息(作为一种特殊的措施)作为您的信息,则可以使用以下信息:

var m = Regex.Match(source, @"data-token\s*=\s*""(?<token>.+?)""");
var authenticityToken = m.Groups["token"].Value;

But try CsQuery first. 但是请先尝试使用CsQuery。 It's a much better approach. 这是一个更好的方法。

Working example http://ideone.com/U224iZ 工作示例http://ideone.com/U224iZ

string start = "data-token=";
  string end = " href";

  string source = "<a class='bizLink' data-token='-iUzEhgdscgbpj5VMi5zoh54FTeFt8M4mj5nsiodxR5VzZOhniodpj6nFQg0nce3MhUxFSgdxjM4JjUVzZuNu8o0sREnFSUzISUXzZWh4iodGQfdxR5VzZWh4iodGQfhli6fnce_=1\" href='";

  int firstTag = source.IndexOf(start);
  int lastTag = source.IndexOf(end, firstTag );
  int startIndex = firstTag + start.Length +1;
  int endIndex = lastTag;
  string authenticityToken = source.Substring(startIndex, endIndex - startIndex -1);


