Web Scraping 一个动态网站，它使用 javasript 和漂亮的汤和正则表达式

Question

I am trying to make an app that gives fantasy football scores for the XFL as a personal project.我正在尝试制作一个应用程序，作为个人项目为 XFL 提供梦幻足球分数。 I was able to use beautiful soup to get the source and String.split() to separate all the stats of the players in But when I try to get the rosters I get something like this:我能够使用漂亮的汤来获取源代码，并使用 String.split() 来分离球员的所有统计数据但是当我尝试获取名单时，我得到了这样的结果：

>**1**</fagtd><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:89px">**Jazz**</td><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:100px">**Ferguson**</td><td style="background-color:white; border-bottom:1px solid black; border-left:none; border-right:1px solid black; border-top:none; text-align:center; vertical-align:bottom; white-space:nowrap; width:61px">**WR**

and out of this I need to get the information 1 Jazz Ferguson and WR .因此我需要获取信息1 Jazz Ferguson 和 WR 。 String.split() will not work for something this complex. String.split() 不适用于这种复杂的事情。 I was thinking about using regular expressions but I am not sure how.我正在考虑使用正则表达式，但我不确定如何使用。 Can any one come up with a reg ex for this or if there is a much easier way point me in the right direction?任何人都可以为此提出一个正则表达式，或者是否有更简单的方法指向正确的方向？ Thank you.谢谢你。

EDIT This is the portion of the code I use to get that HTML data above.编辑这是我用来获取上面的 HTML 数据的代码部分。 It prints out the whole thing that part above is only a section.它打印出整个内容，上面的部分只是一个部分。

session = HTMLSession()
page = session.get('https://www.xfl.com/en-US/teams/dallas/renegades-articles/dallas-renegades-roster')

soup2 = BeautifulSoup(page.content, PARSER)
script = soup2.find_all('script')

for tags in script:

    if ((tags.text.find('"title":"Dallas Renegades roster"')) >= 0):

        rosterData = tags.text[(tags.text.find('College')):]
        rosterData = rosterData.replace('</td>', '').replace('\\','')

        print(rosterData)

Answer 1

Hi below code gets the full table as a dataframe you can filter the required data from this:-嗨，下面的代码获取完整的表格作为数据框，您可以从中过滤所需的数据：-

import requests
import pandas as pd
url = 'https://www.xfl.com/en-US/teams/dallas/renegades-articles/dallas-renegades-roster'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)

Web Scraping 一个动态网站，它使用 javasript 和漂亮的汤和正则表达式

问题描述

1 个解决方案

解决方案1
2 2020-02-28 04:48:33

Web Scraping 一个动态网站，它使用 javasript 和漂亮的汤和正则表达式

问题描述

1 个解决方案

解决方案1 2 2020-02-28 04:48:33

解决方案1
2 2020-02-28 04:48:33