繁体   English   中英

将字符串令牌化或拆分为文本和HTML标签项

[英]Tokenize or Split String Into Text & Html Tag Items

我正在寻找一种最有效的方式来接受字符串并将其标记化为一个数组,从而将所有HTML标记组分离出来。

Example Input (String): 
    "I can format my text so that <strong>This is bold</strong> and this is not."

Desired Output (String[] array): 
    "I can format my text so that",
    "<strong>",
    "This is bold",
    "</strong>",
    "and this is not."

Alternate Output Just As Good(String[] array): 
    "I",
    "can",
    "format",
    "my",
    "text",
    "so",
    "that",
    "<strong>",
    "This",
    "is",
    "bold",
    "</strong>",
    "and",
    "this",
    "is",
    "not."

我不确定解决此问题的最佳方法。 任何帮助,将不胜感激。

您可以将Regex.Split()与一组零长度的断言一起使用,以在后跟<>位置进行拆分:

string input = "I can format my text so that <strong>This is bold</strong> and this is not.";
string[] output = Regex.Split(input, "(?=<)|(?<=>)");

(?=pattern)被称为先行断言,确保遵循该pattern
(?<=pattern)是一个在后面的断言,具有相同的概念,但在位置之前查看字符

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM