从html标签提取文本

Question

I have a page like this which has 3 values in li tags 我有一个这样的页面，在li标签中有3个值

<li>nafiz</li>
<li>ACE</li>
<li>Sanah</li>

And this code gives me only the last innertext: 这段代码只给了我最后一个内部文本：

public string names = "";
    public string names2 = "";
    public string names3 = "";


    // Use this for initialization
    void Start () {

        HtmlWeb hw = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = hw.Load(openUrl);

    foreach (HtmlNode nd in doc.DocumentNode.SelectNodes("//li"))
        {
            names=nd.InnerText.ToString();

        }

How can I store all 3 values in those strings? 如何在这些字符串中存储所有3个值？

Answer 1

you can use this function 你可以使用这个功能

    string[] GetItems(string htmlText)
    {
        List<string> Answer = new List<string>();
        for (int i = 0; i < htmlText.Length; i++)
        {
            int start = htmlText.IndexOf('>', i);
            i = start;
            int end = htmlText.IndexOf('<', i);

            if (end == -1 || start == -1)
                break;

            string Item = htmlText.Substring(start + 1, end - start - 1);
            if (Item.Trim() != "")
                Answer.Add(Item);

            i = end + 1;
        }
        return Answer.ToArray();
    }

and use it... 并使用它...

     foreach (string item in GetItems(YourText))
     {
          MessageBox.Show(item);
     }

Answer 2

Will be easier if you store the 3 values in string array or list, for example : 如果将3个值存储在字符串数组或列表中，将更加容易，例如：

var names = new List<string>();
.....
.....
foreach (HtmlNode nd in doc.DocumentNode.SelectNodes("//li"))
{
    names.Add(nd.InnerText.Trim());
}

InnerText is already of type string no need to put additional ToString() . InnerText已经是string类型，无需放置其他ToString() 。 Trim() in above example meant to clear the name from leading and trailing white-spaces. Trim()表示要从开头和结尾的空格中清除name 。

从html标签提取文本

问题描述

2 个解决方案

解决方案1
1 2014-09-20 06:20:35

解决方案2
1 已采纳 2014-09-20 06:55:22

从html标签提取文本

问题描述

2 个解决方案

解决方案1 1 2014-09-20 06:20:35

解决方案2 1 已采纳 2014-09-20 06:55:22

解决方案1
1 2014-09-20 06:20:35

解决方案2
1 已采纳 2014-09-20 06:55:22