![](/img/trans.png)
[英]How to extract and download all images from a website using beautifulSoup?
[英]How to extract a table from a website using BeautifulSoup?
我想使用漂亮的汤从此网站提取路易斯安那州每个县的 FIPS 代码,并创建一个 Pandas Dataframe: https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/ cp/?cid=nrcs143_013697
这些列将是 FIPS、名称和 State。我在检查元素时尝试通过 tr、td 和表查找,但我不知道如何只挑出主要数据,然后将其放入 pandas dataframe。一旦我找到特定的表,应该很容易做这样的事情:
if state == 'LA':
# put data into a dataframe
import requests
from bs4 import BeautifulSoup
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
# print(soup)
for county in soup.find_all('table'):
print(county.text)
您可以使用 select <table>
和class="data"
然后使用pd.read_html
。 例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
df = pd.read_html(str(soup.select_one(".data")))[0]
# filter State == 'LA'
print(df[df.State == "LA"].head())
印刷:
FIPS Name State
1109 22001 Acadia LA
1110 22003 Allen LA
1111 22005 Ascension LA
1112 22007 Assumption LA
1113 22009 Avoyelles LA
有一个表,因此可以迭代该表中的<tr>
元素。
如果想要一个数据帧只包含一个特定的state那么可以在添加到数据帧之前对其进行过滤,或者为一个子数据帧过滤所有数据的数据帧。
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
data = []
for tr in soup.find('table', class_='data').find_all('tr'):
row = [td.text for td in tr.find_all('td')]
# If want to filter out all except LA then can do that here
if len(row) == 3 and row[2] == 'LA':
data.append(row)
df = pd.DataFrame(data, columns=['FIPS', 'Name', 'State'])
print(df)
Output:
FIPS Name State
0 22001 Acadia LA
1 22003 Allen LA
2 22005 Ascension LA
3 22007 Assumption LA
4 22009 Avoyelles LA
.. ... ... ...
63 22127 Winn LA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.