簡體   English   中英

使用jsoup從HTML網頁解析PHP數據

[英]Parsing PHP data from HTML webpage with jsoup

我不完全確定如何在這里說出這個問題或標題。 我正在使用jsoup來解析一個網頁( http://champion.gg/statistics/ ),我正試圖使用​​這段代碼從他們的表中獲取統計數據。

public void connect(String url) {
    try {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        System.out.println(doc.toString());
        Element table = doc.select("table[class=table table-striped]").first();
        Element tbody = table.select("tbody").first();
        Iterator<Element> rows = tbody.select("tr").iterator();
        rows.forEachRemaining(row -> {
            System.out.println(row.toString());
        });
    } catch(IOException exception) {
        if(Settings.DEBUG) {
            Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
        }
        Program.alert("Error loading webpage!");
    }
}

它產生了這個結果

<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])"> 
 <td class="rank">{{indexNumber($index, filteredChampions.length)}}</td> 
 <td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}"> 
  <div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}"> 
   <div class="matchup-champion {{champion.key}}"></div> 
   <span class="stat-champ-title">{{champion.title}}</span> 
  </div> </a> </td> 
 <td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td> 
 <td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td> 
 <td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td> 
 <td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td> 
 <td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td> 
 <td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td> 
 <td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td> 
 <td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td> 
 <td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td> 
 <td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td> 
 <td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td> 
 <td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td> 
 <td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td> 
 <td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td> 
 <td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td> 
 <td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td> 
</tr>

現在,我得到的結果不是產生平均殺人數量的結果而是特定的冠軍,它會說champ.general.kills 我如何解析頁面,以便而不是champion.general.kills它將給出一個實際的結果,如8?

在從網頁中提取數據時,您必須轉到數據所在的位置。 在這種情況下,數據仍在網頁內,這很好。 您需要獲取包含數據的腳本標記並解析它。 目前,此示例代碼假定它是索引11處的腳本標記。

public static void main(String[] args)
{
    try
    {
        Document doc = Jsoup
                .connect("http://champion.gg/statistics/")
                .userAgent(
                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                .get();
        System.out.println(doc.toString());
        Elements table = doc.select("script");
        Element script = table.get(11);
        parseText(script);
    }
    catch (IOException exception)
    {

    }
}

public static void parseText(Element script)
{
    String text = ((DataNode) script.childNode(0)).toString().trim();
    int index = text.indexOf("_id");
    while (index > 0)
    {
        index += 6;// Beginning of value
        int endQuote = text.indexOf("\"", index);
        String id = text.substring(index, endQuote);
        index = text.indexOf("\"key\":\"", endQuote);
        endQuote = text.indexOf("\"", index + 8);
        String key = text.substring(index, endQuote);
        index = text.indexOf("\"kills\":", endQuote);
        endQuote = text.indexOf(",", index);
        String kills = text.substring(index, endQuote);
        text = text.substring(endQuote);
        index = text.indexOf("_id", index);
        System.out.println(id + key + kills);
    }
}

輸出:

5812965753fa9743395ee93a “鑰匙”: “厄加特” 殺死“:6.47

5812965753fa9743395ee93b “鑰匙”: “Aatrox” 殺死“:5.8

5812965753fa9743395ee93d “鑰匙”: “Galio” 殺死“:4.58

5812965753fa9743395ee940“key”:“Kled”殺死“:7.3 ......

我在ProgrammersBlock的幫助下找到了答案。 通過重新獲取腳本數據,我將它從JSON轉換為完整的Java對象!

package com.databot.web.parser;

import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Level;

import org.jsoup.Jsoup;
import org.jsoup.nodes.DataNode;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import com.databot.Program;
import com.databot.Settings;
import com.databot.champions.ChampionStats;
import com.databot.champions.Champion;
import com.google.gson.stream.JsonReader;

public class WebParser {

public void connect(String url) {
    try {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        Elements table = doc.select("script");
        Element script = table.get(11);
        parseText(script);
    } catch(IOException exception) {
        if(Settings.DEBUG) {
            Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
        }
        Program.alert("Error loading webpage!");
    }
}

public void parseText(Element script)
{
    String text = ((DataNode) script.childNode(0)).toString().substring(22).trim();
    System.out.println(text);
    List<Champion> champions = new ArrayList<>();
    try {
        JsonReader reader = new JsonReader(new StringReader(text));
        reader.setLenient(true);
        reader.beginArray();
        while(reader.hasNext()) {
            reader.beginObject();
                String id = "", key = "", role = "", title = "";
                ChampionStats stats = new ChampionStats(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0);
            while(reader.hasNext()) {
                String name = reader.nextName();
                if(name.equalsIgnoreCase("_id")) {
                    id = reader.nextString();
                } else if(name.equalsIgnoreCase("key")) {
                    key = reader.nextString();
                } else if(name.equalsIgnoreCase("role")) {
                    role = reader.nextString();
                } else if(name.equalsIgnoreCase("title")) {
                    title = reader.nextString();
                } else if(name.equalsIgnoreCase("general")) {
                    double winPercent = 0, playPercent = 0, banRate = 0, experience = 0, kills = 0, deaths = 0, assists = 0, totalDamageDealtToChampions = 0, totalDamageTaken = 0, totalHeal = 0, largestKillingSpree = 0, minionsKilled = 0, neutralMinionsKilledTeamJungle = 0, neutralMinionsKilledEnemyJungle = 0, goldEarned = 0; 
                    int overallPosition = 0, overallPositionChange = 0;
                        reader.beginObject();
                        while(reader.hasNext()) {
                            String gName = reader.nextName();
                            if(gName.equalsIgnoreCase("winPercent")) {
                                winPercent = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("playPercent")) {
                                playPercent = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("banRate")) {
                                banRate = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("experience")) {
                                experience = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("kills")) {
                                kills = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("deaths")) {
                                deaths = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("assists")) {
                                assists = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("totalDamageDealtToChampions")) {
                                totalDamageDealtToChampions = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("totalDamageTaken")) {
                                totalDamageTaken = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("totalHeal")) {
                                totalHeal = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("largestKillingSpree")) {
                                largestKillingSpree = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("minionsKilled")) {
                                minionsKilled = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("neutralMinionsKilledTeamJungle")) {
                                neutralMinionsKilledTeamJungle = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("neutralMinionsKilledEnemyJungle")) {
                                neutralMinionsKilledEnemyJungle = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("goldEarned")) {
                                goldEarned = reader.nextDouble();
                            } else if(gName.equalsIgnoreCase("overallPosition")) {
                                overallPosition = reader.nextInt();
                            } else if(gName.equalsIgnoreCase("overallPositionChange")) {
                                overallPositionChange = reader.nextInt();
                            } else {
                                reader.skipValue();
                            }
                        }
                        reader.endObject();
                        stats = new ChampionStats(winPercent, playPercent, banRate, experience, kills, deaths, assists, totalDamageDealtToChampions, totalDamageTaken, totalHeal, largestKillingSpree, minionsKilled, neutralMinionsKilledTeamJungle, neutralMinionsKilledEnemyJungle, goldEarned, overallPosition, overallPositionChange);
                } else {
                    reader.skipValue();
                }
            }
            reader.endObject();
            champions.add(new Champion(id, key, role, title, stats));
        }
        reader.endArray();
        reader.close();
    } catch (Exception e) {
        Program.alert("Error reading JSON data!");
        e.printStackTrace();
    }
    champions.forEach(champion -> {
        System.out.println(champion.toString());
    });
}
}

這是我的完整WebParser類,如果有人有興趣,我確定有更好的方法或更有效的方式來寫這個,但這是我現在的工作!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM