[英]Extracting texts from html tags
I have a page like this which has 3 values in li
tags 我有一个这样的页面,在
li
标签中有3个值
<li>nafiz</li>
<li>ACE</li>
<li>Sanah</li>
And this code gives me only the last innertext: 这段代码只给了我最后一个内部文本:
public string names = "";
public string names2 = "";
public string names3 = "";
// Use this for initialization
void Start () {
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load(openUrl);
foreach (HtmlNode nd in doc.DocumentNode.SelectNodes("//li"))
{
names=nd.InnerText.ToString();
}
How can I store all 3 values in those strings? 如何在这些字符串中存储所有3个值?
you can use this function 你可以使用这个功能
string[] GetItems(string htmlText)
{
List<string> Answer = new List<string>();
for (int i = 0; i < htmlText.Length; i++)
{
int start = htmlText.IndexOf('>', i);
i = start;
int end = htmlText.IndexOf('<', i);
if (end == -1 || start == -1)
break;
string Item = htmlText.Substring(start + 1, end - start - 1);
if (Item.Trim() != "")
Answer.Add(Item);
i = end + 1;
}
return Answer.ToArray();
}
and use it... 并使用它...
foreach (string item in GetItems(YourText))
{
MessageBox.Show(item);
}
Will be easier if you store the 3 values in string array or list, for example : 如果将3个值存储在字符串数组或列表中,将更加容易,例如:
var names = new List<string>();
.....
.....
foreach (HtmlNode nd in doc.DocumentNode.SelectNodes("//li"))
{
names.Add(nd.InnerText.Trim());
}
InnerText
is already of type string
no need to put additional ToString()
. InnerText
已经是string
类型,无需放置其他ToString()
。 Trim()
in above example meant to clear the name
from leading and trailing white-spaces. Trim()
表示要从开头和结尾的空格中清除name
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.