简体   繁体   中英

C# Html Agility Pack Parsing Data From Website

I have a problem with parsing data from a website. When downloaded html and loaded it html document turns null. Also i can't parse any data from table because no or in html document.Rows and columns are in part in table but its nulled..

Anyone help please ? Thanks.. This is code i used;

Uri uri =new Uri("https://deprem.afad.gov.tr/sondepremler.html");
HttpWebRequest webClient = (HttpWebRequest)WebRequest.Create(uri);
webClient.Method = "GET";
webClient.ContentType = "text/html;charset=utf-8";
HtmlDocument doc = new HtmlDocument();

            using (var response = (HttpWebResponse)webClient.GetResponse())
            {
                using (var stream = response.GetResponseStream())
                {
                    doc.Load(stream, Encoding.GetEncoding("utf-8"));
                }
            }
            var tds = doc.DocumentNode.SelectNodes("//table//tr//td");

And this is the html document turned from website;

<table id="resultTable" class="table table-striped" cellspacing="0" width="100%">
    <thead>
        <tr>
            <th></th>
            <th id="thDate">Tarih(TS)</th>
            <th>Ajans</th>
            <th>Enlem</th>
            <th>Boylam</th>
            <th>Derinlik</th>
            <!--<th>Rms</th> -->
            <th>Tip</th>
            <th>Büyüklük</th>
            <th>Ülke</th>
            <th>İl</th>
            <th>İlçe</th>
            <th>Köy</th>
            <th>Diğer</th>
            <th>EventID</th>
        </tr>
    </thead>
    <tbody id="tbody">
    </tbody>
</table>

When you are visiting a site, you can press F12 and see all the calls that are being made. You can use those API calls to retrieve the data yourself using Postman or via C# using Rest clients.

This is an example of how you can get the data you are looking for. I used Dev tools on chrome to see the call being made under Network Tab.

    public class Event
    {
        public string eventId { get; set; }
        public string time { get; set; }
        public string agency { get; set; }
        public string lat { get; set; }
        public string lon { get; set; }
        public string depth { get; set; }
        public string rms { get; set; }
        public string type { get; set; }
        public string m { get; set; }
        public object place { get; set; }
        public string country { get; set; }
        public string city { get; set; }
        public string district { get; set; }
        public string town { get; set; }
        public string other { get; set; }
        public object mapImagePath { get; set; }
        public object strike1 { get; set; }
        public object dip1 { get; set; }
        public object rake1 { get; set; }
        public object strike2 { get; set; }
        public object dip2 { get; set; }
        public object rake2 { get; set; }
        public object ftype { get; set; }
        public object pic { get; set; }
        public object file { get; set; }
        public object focalId { get; set; }
        public string time2 { get; set; }
    }

You can use the above class in main program like,

    var client = new RestClient("https://deprem.afad.gov.tr/latestCatalogsList");
    client.Timeout = -1;
    var request = new RestRequest(Method.POST);
    request.AddHeader("Content-Type", "multipart/form-data");
    request.AlwaysMultipartFormData = true;
    request.AddParameter("m", "0");
    request.AddParameter("utc", "0");
    request.AddParameter("lastDay", "1");
    var response = client.Execute<List<Event>>(request);

    List<Event> myData = response.Data;
    Console.WriteLine(response.Content);

You will have an object with all the data from the site. You can do whatever you need to with that data.

Please do mark the post answered if it helped

Unfortunately, the data you are trying to get to, you will not be able to access it from HtmlAgilityPack.

Why can you not access the data in Html-Agility-Pack that is clearly visible when you open in Chrome (and use DevTools)?

Thats because the data is rendered by chrome or other browsers you use. Html-Agility-Pack does not process the scripts and other executions that the browsers are capable of executing. You can access the static data (such as TH / headers of the table) but not the auto generated row data that most likely comes from database.

If you look at the InnerHtml of the document you do get, there is a script that needs to be executed.

            success: function(data)
            {
                $('#resultTable').DataTable().destroy();
                $('#resultTable tbody').empty();

                var locations = [];

                var i;
                for (i = 0; data.length > i; ++i) {

                    var lat = parseFloat(data[i].lat);
                    var lon = parseFloat(data[i].lon);
                    //var location = new google.maps.LatLng(lat, lon);
                    var location = convertGoogleMapCordsToOpenLayerCords(lat, lon);
              ...

THis is the script that actually generates the table / tbody with data that you are trying to get to.

You are better off looking for an API that the site might provide to get the details directly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM