[英]Get just a part of the string
I need some help with C# this time. 这次我需要C#的帮助。
I have a html with this: 我有一个HTML:
<ul class="ui_sug_list"></ul></div></div></div></form>
</div></div><div class="cnt_listas"><ol id="listagem1"
class="cols_2"><li><a href="/laura-pausini/73280/">16/5/74
</a></li><li><a href="/laura-pausini/73280/traducao.html">
16/5/74 (tradução)</a></li><li><a href="/laura-pausini/1566533/">16/5/74
(Spanish Version)</a></li><li><a href="/laura-pausini/1566533/traducao.html">
16/5/74 (Spanish Version) (tradução)</a></li><li><a href="/laura-pausini/1991556/">
A Simple Vista</a></li><li><a href="/laura-pausini/1991556/traducao.html">
A Simple Vista (tradução)</a></li>
I download an html like that, it comes with no tabulation from web. 我下载了这样的html,它没有来自网络的表格。 I need to print only the name of the song and the link that goes to the song. 我只需要打印歌曲的名称和指向该歌曲的链接。 I have no idea how to get just this information from the file. 我不知道如何从文件中获取这些信息。
Here's how I download the file: 这是我下载文件的方式:
// Realiza Download do arquivo
WebClient webClient = new WebClient();
webClient.DownloadFile(
"http://letras.mus.br/" + termo_busca + "/", @"C:\Temp\letras.html");
Can you give me a hand? 你能帮我个忙吗?
You should definitely use the HTML Agility Pack . 您绝对应该使用HTML Agility Pack 。
You can get your links and link values like this: 您可以这样获得链接和链接值:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(Html);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
var value = link.Attributes["href"].Value; //gives you the link
var text = link.InnerText; //gives you the text of the link
}
You could also use this class which also uses the html agility pack: 您还可以使用也使用html敏捷包的此类:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;
namespace Foo.Client
{
public class Website
{
public string Html { get; private set; }
private Website(string html)
{
Html = html;
}
public static Website Load(Uri uri)
{
validate(uri);
return new Website(getPageContentFor(uri));
}
public List<string> GetHyperLinks()
{
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(Html);
return extractLinksFrom(doc.DocumentNode.SelectNodes("//a[@href]"));
}
private static string getPageContentFor(Uri uri)
{
try
{
var request = (HttpWebRequest)WebRequest.Create(uri);
var response = (HttpWebResponse)request.GetResponse();
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
return reader.ReadToEnd();
}
catch (WebException)
{
return String.Empty;
}
}
private List<string> extractLinksFrom(HtmlNodeCollection nodes)
{
var result = new List<string>();
if (nodes == null) return result;
foreach (var link in nodes)
result.Add(link.Attributes["href"].Value);
return result;
}
private static void validate(Uri uri)
{
if (!uri.IsAbsoluteUri)
throw new ArgumentException("invalid uri format");
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.