Web Scraping 一個動態網站，它使用 javasript 和漂亮的湯和正則表達式

Question

我正在嘗試制作一個應用程序，作為個人項目為 XFL 提供夢幻足球分數。 我能夠使用漂亮的湯來獲取源代碼，並使用 String.split() 來分離球員的所有統計數據但是當我嘗試獲取名單時，我得到了這樣的結果：

>**1**</fagtd><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:89px">**Jazz**</td><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:100px">**Ferguson**</td><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:61px">**WR**

因此我需要獲取信息1 Jazz Ferguson 和 WR 。 String.split() 不適用於這種復雜的事情。 我正在考慮使用正則表達式，但我不確定如何使用。 任何人都可以為此提出一個正則表達式，或者是否有更簡單的方法指向正確的方向？ 謝謝你。

編輯這是我用來獲取上面的 HTML 數據的代碼部分。 它打印出整個內容，上面的部分只是一個部分。

session = HTMLSession()
page = session.get('https://www.xfl.com/en-US/teams/dallas/renegades-articles/dallas-renegades-roster')

soup2 = BeautifulSoup(page.content, PARSER)
script = soup2.find_all('script')

for tags in script:

    if ((tags.text.find('"title":"Dallas Renegades roster"')) >= 0):

        rosterData = tags.text[(tags.text.find('College')):]
        rosterData = rosterData.replace('</td>', '').replace('\\','')

        print(rosterData)

Answer 1

嗨，下面的代碼獲取完整的表格作為數據框，您可以從中過濾所需的數據：-

import requests
import pandas as pd
url = 'https://www.xfl.com/en-US/teams/dallas/renegades-articles/dallas-renegades-roster'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)

Web Scraping 一個動態網站，它使用 javasript 和漂亮的湯和正則表達式

問題描述

1 個解決方案

解決方案1
2 2020-02-28 04:48:33

Web Scraping 一個動態網站，它使用 javasript 和漂亮的湯和正則表達式

問題描述

1 個解決方案

解決方案1 2 2020-02-28 04:48:33

解決方案1
2 2020-02-28 04:48:33