[英]Naive Problem: Receiving same data from all four different links
import pandas as pd
df31_12_r1 = pd.read_html('https://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/2004/20041231ps.html#r1')[0]
df31_12_r2 = pd.read_html('https://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/2004/20041231ps.html#r2')[0]
df31_12_r3 = pd.read_html('https://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/2004/20041231ps.html#r3')[0]
df31_12_r4 = pd.read_html('https://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/2004/20041231ps.html#r4')[0]
The result displayed is the same from all the data frame, however it should be different: One result is displayed (other three are similar):所有数据框显示的结果都相同,但应该不同:显示一个结果(其他三个类似):
Unit Power Down Reason or Comment Change in report (*) Number of Scrams (#)
0 Beaver Valley 1 100 NaN NaN NaN NaN
1 Beaver Valley 2 100 NaN NaN NaN NaN
2 Calvert Cliffs 1 100 NaN NaN NaN NaN
3 Calvert Cliffs 2 100 NaN NaN NaN NaN
4 FitzPatrick 100 NaN NaN NaN NaN
5 Ginna 100 NaN NaN NaN NaN
6 Hope Creek 1 0 10/10/2004 REFUELING OUTAGE NaN NaN
7 Indian Point 2 100 NaN NaN NaN NaN
8 Indian Point 3 100 NaN NaN NaN NaN
9 Limerick 1 99 NaN REDUCED POWER DUE TO FEEDWATER FLOW CONCERNS NaN NaN
How we can get exact data to each link instead, link pulling only top CSV file data from the webpage?我们如何才能获得每个链接的准确数据,链接仅从网页中提取顶部 CSV 文件数据? Thank you in advance for help!
提前感谢您的帮助!
All tables are in one html page, so is possible create list of DataFrames dfs
and then select by index:所有表都在一个 html 页面中,因此可以创建 DataFrames
dfs
列表,然后按索引创建 select:
url = 'https://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/2004/20041231ps.html'
dfs = pd.read_html(url)
df1 = dfs[0]
df2 = dfs[1]
df3 = dfs[2]
df4 = dfs[3]
print (df1.head())
print (df2.head())
print (df3.head())
print (df4.head())
If need one DataFrame with join list of DataFrames dfs
add concat
:如果需要一个 DataFrame 与 DataFrames
dfs
的连接列表添加concat
:
url = 'https://www.nrc.gov/reading-rm/doc-collections/event-status/reactor-status/2004/20041231ps.html'
dfs = pd.read_html(url)
df = pd.concat(dfs, ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.