仅获取字符串的一部分

Question

I need some help with C# this time. 这次我需要C＃的帮助。

I have a html with this: 我有一个HTML：

<ul class="ui_sug_list"></ul></div></div></div></form>
</div></div><div class="cnt_listas"><ol id="listagem1" 
class="cols_2"><li><a href="/laura-pausini/73280/">16/5/74
</a></li><li><a href="/laura-pausini/73280/traducao.html">
16/5/74 (tradução)</a></li><li><a href="/laura-pausini/1566533/">16/5/74
(Spanish Version)</a></li><li><a href="/laura-pausini/1566533/traducao.html">
16/5/74 (Spanish Version) (tradução)</a></li><li><a href="/laura-pausini/1991556/">
A Simple Vista</a></li><li><a href="/laura-pausini/1991556/traducao.html">
A Simple Vista (tradução)</a></li>

I download an html like that, it comes with no tabulation from web. 我下载了这样的html，它没有来自网络的表格。 I need to print only the name of the song and the link that goes to the song. 我只需要打印歌曲的名称和指向该歌曲的链接。 I have no idea how to get just this information from the file. 我不知道如何从文件中获取这些信息。

Here's how I download the file: 这是我下载文件的方式：

        // Realiza Download do arquivo
        WebClient webClient = new WebClient();
        webClient.DownloadFile(
        "http://letras.mus.br/" + termo_busca + "/", @"C:\Temp\letras.html");

Can you give me a hand? 你能帮我个忙吗？

Answer 1

You should definitely use the HTML Agility Pack . 您绝对应该使用HTML Agility Pack 。

You can get your links and link values like this: 您可以这样获得链接和链接值：

 var doc = new HtmlAgilityPack.HtmlDocument();
 doc.LoadHtml(Html);
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    var value = link.Attributes["href"].Value; //gives you the link
    var text = link.InnerText; //gives you the text of the link
 }

You could also use this class which also uses the html agility pack: 您还可以使用也使用html敏捷包的此类：

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;

namespace Foo.Client
{
    public class Website
    {
        public string Html { get; private set; }

        private Website(string html)
        {
            Html = html;
        }

        public static Website Load(Uri uri)
        {
            validate(uri);
            return new Website(getPageContentFor(uri));
        }

        public List<string> GetHyperLinks()
        {
            var doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(Html);
            return extractLinksFrom(doc.DocumentNode.SelectNodes("//a[@href]"));
        }

        private static string getPageContentFor(Uri uri)
        {
            try
            {
                var request = (HttpWebRequest)WebRequest.Create(uri);
                var response = (HttpWebResponse)request.GetResponse();
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                    return reader.ReadToEnd();
            }
            catch (WebException)
            {
                return String.Empty;
            }
        }

        private List<string> extractLinksFrom(HtmlNodeCollection nodes)
        {
            var result = new List<string>();
            if (nodes == null) return result;
            foreach (var link in nodes)
                    result.Add(link.Attributes["href"].Value);
            return result;
        }

        private static void validate(Uri uri)
        {
            if (!uri.IsAbsoluteUri)
                throw new ArgumentException("invalid uri format");
        }
    }
}

仅获取字符串的一部分

问题描述

1 个解决方案

解决方案1
2 2012-09-24 20:13:19

仅获取字符串的一部分

问题描述

1 个解决方案

解决方案1 2 2012-09-24 20:13:19

解决方案1
2 2012-09-24 20:13:19