[英]Python add data to an empty pd.Dataframe
I'm quite new to python, the thing I'm trying to do is get data from an website and add a part of the webpage to and pandas dataframe.我对 python 很陌生,我想做的是从网站获取数据并将网页的一部分添加到 pandas dataframe。
This is the code I got already but I'm getting an error when adding data to the Dataframe.这是我已经得到的代码,但是在将数据添加到 Dataframe 时出现错误。
The Code I got:我得到的代码:
url = 'https://oldschool.runescape.wiki/w/Module:Exchange/Anglerfish/Data'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
price_data = soup.find_all('span', class_='s1')
df = pd.DataFrame()
for data in price_data:
a = pd.DataFrame(data.text.split(":")[0],data.text.split(":")[1])
df.append(a)
print(df)
The Error I'm Getting:我得到的错误:
ValueError Traceback (most recent call last)
<ipython-input-33-963d51917cf2> in <module>()
10
11 for data in price_data:
---> 12 a = pd.DataFrame(data.text.split(":")[0],data.text.split(":")[1])
13 df.append(a)
14
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
507 )
508 else:
--> 509 raise ValueError("DataFrame constructor not properly called!")
510
511 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
It seems that the data structure you get from data.text.split(":")[0],data.text.split(":")[1]
does not suit what is expected from the function pd.DataFrame()
.您从
data.text.split(":")[0],data.text.split(":")[1]
获得的数据结构似乎不符合 function pd.DataFrame()
的预期. First take a look at the documentation of the function to fully understand what is expecting and how to properly pass data to it.首先查看 function 的文档,以充分了解预期内容以及如何正确地将数据传递给它。 You can either pass a dictionary with the column name and the values (arrays must be of equal length, or an index should be specified), or lists/arrays as YOBEN_S proposed, for example:
您可以传递包含列名和值的字典(数组必须等长,或者应指定索引),或者YOBEN_S建议的列表/数组,例如:
a = pd.DataFrame({'Column_1':data.text.split(":")[0],'Column_2':data.text.split(":")[1]})
Since you are dealing with html
data, you should try a different approach using pandas.read_html()
which can be read here for more information由于您正在处理
html
数据,因此您应该尝试使用pandas.read_html()
的不同方法,可以在此处阅读以获取更多信息
Fix your code by通过以下方式修复您的代码
pd.DataFrame([[data.text.split(":")[0],data.text.split(":")[1]]])
I did some more research, the best way for me to do it was:我做了一些更多的研究,对我来说最好的方法是:
#get data from marketwatch
url = 'https://oldschool.runescape.wiki/w/Module:Exchange/Anglerfish/Data'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
price_data = soup.find_all('span', class_='s1')
df = pd.DataFrame(columns=['timestamp', 'price'])
for data in price_data:
df = df.append({'timestamp': data.text.split(":")[0], 'price': data.text.split(":")[1]}, ignore_index=True)
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.