[英]Extract Specific Text from Html Page
Html page is look like this HTML页面看起来像这样
<tr>
<th rowspan="4" scope="row">General</th>
<td class="ttl"><a href="network-bands.php3">2G Network</a></td>
<td class="nfo">GSM 850 / 900 / 1800 / 1900 </td>
</tr><tr>
<td class="ttl"><a href="network-bands.php3">3G Network</a></td>
<td class="nfo">HSDPA 900 / 1900 / 2100 </td>
</tr>
for that i am try to use 为此,我尝试使用
var text = document.getElementsByClassName("nfo")[0].innerHTML;
Provided By Alex 由Alex提供
But i am getting this error Error 2 The name 'document' does not exist in the current context C:\\Users\\Nabi Javid\\Documents\\Visual Studio 2008\\Projects\\WpfApplication2\\WpfApplication2\\Window1.xaml.cs 30 22 WpfApplication2 但我收到此错误错误2在当前上下文中不存在名称“文档” C:\\ Users \\ Nabi Javid \\ Documents \\ Visual Studio 2008 \\ Projects \\ WpfApplication2 \\ WpfApplication2 \\ Window1.xaml.cs 30 22 WpfApplication2
Am i missing some Libary or something 我想念一些图书馆书吗
Currently my code is like that 目前我的代码是这样的
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;
namespace WpfApplication1
{
/// <summary>
/// Interaction logic for Window1.xaml
/// </summary>
public partial class Window1 : Window
{
public Window1()
{
InitializeComponent();
}
private void button1_Click(object sender, RoutedEventArgs e)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.Load("nokia_c5_03-3578.html");
var text = document.getElementsByClassName("nfo")[0].innerHTML;
}
}
}
You are mixing C# code with javascript code. 您正在将C#代码与javascript代码混合在一起。
Instead of this: 代替这个:
var text = document.getElementsByClassName("nfo")[0].innerHTML;
type this: 输入:
var text = htmlDoc.DocumentNode.SelectNodes("//td[@class='nfo']")[0].InnerHtml;
To keep it simple, I have refrained from checking exceptions. 为简单起见,我避免检查异常。
I'm not very deep into .net but it looks like you are trying to mix JavaScript-code 我对.net不太了解,但看起来您正在尝试混合JavaScript代码
var text = document.getElementsByClassName("nfo")[0].innerHTML;
with your .net code...? 与您的.net代码...?
You can get elements by class name using next method which return elements where are several classes defined in one class attribute: 您可以使用next方法按类名获取元素,该方法返回在一个class属性中定义了几个类的元素:
private HtmlNodeCollection GetElementsByClassName(HtmlDocument htmlDocument, string className)
{
string xpath =
String.Format(
"//*[contains(concat(' ', normalize-space(@class), ' '), ' {0} ')]",
className);
return htmlDocument.DocumentNode.SelectNodes(xpath);
}
You must use the htmlDoc
variable to call methods in your case. 在这种情况下,必须使用htmlDoc
变量来调用方法。 By the way the HtmlDocument
class does not have a method with that name. 顺便说一句, HtmlDocument
类没有使用该名称的方法。 Try to see if you can find another match for your needs in this list . 尝试查看是否可以在此列表中找到满足您需求的其他匹配项。
As the error says, the document
variable does not exits in your code. 如错误所示, document
变量不会在您的代码中退出。
do you want 你想要
var text = htmlDoc.getElementsByClassName("nfo")[0].innerHTML;
? ? Not familiar with HTML Agility Pack, but that would seem to make sense 不熟悉HTML Agility Pack,但这似乎很有意义
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.