简体   繁体   English

Python 将数据添加到空 pd.Dataframe

[英]Python add data to an empty pd.Dataframe

I'm quite new to python, the thing I'm trying to do is get data from an website and add a part of the webpage to and pandas dataframe.我对 python 很陌生,我想做的是从网站获取数据并将网页的一部分添加到 pandas dataframe。

This is the code I got already but I'm getting an error when adding data to the Dataframe.这是我已经得到的代码,但是在将数据添加到 Dataframe 时出现错误。

The Code I got:我得到的代码:

url = 'https://oldschool.runescape.wiki/w/Module:Exchange/Anglerfish/Data'
r = requests.get(url)

soup = BeautifulSoup(r.content, 'html.parser')

price_data = soup.find_all('span', class_='s1')
df = pd.DataFrame()

for data in price_data:
  a = pd.DataFrame(data.text.split(":")[0],data.text.split(":")[1])
  df.append(a)

print(df)

The Error I'm Getting:我得到的错误:

ValueError                                Traceback (most recent call last)
<ipython-input-33-963d51917cf2> in <module>()
 10 
 11 for data in price_data:
---> 12   a = pd.DataFrame(data.text.split(":")[0],data.text.split(":")[1])
 13   df.append(a)
 14 

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
507                 )
508             else:
--> 509                 raise ValueError("DataFrame constructor not properly called!")
510 
511         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

It seems that the data structure you get from data.text.split(":")[0],data.text.split(":")[1] does not suit what is expected from the function pd.DataFrame() .您从data.text.split(":")[0],data.text.split(":")[1]获得的数据结构似乎不符合 function pd.DataFrame()的预期. First take a look at the documentation of the function to fully understand what is expecting and how to properly pass data to it.首先查看 function 的文档,以充分了解预期内容以及如何正确地将数据传递给它。 You can either pass a dictionary with the column name and the values (arrays must be of equal length, or an index should be specified), or lists/arrays as YOBEN_S proposed, for example:您可以传递包含列名和值的字典(数组必须等长,或者应指定索引),或者YOBEN_S建议的列表/数组,例如:

a = pd.DataFrame({'Column_1':data.text.split(":")[0],'Column_2':data.text.split(":")[1]})

Since you are dealing with html data, you should try a different approach using pandas.read_html() which can be read here for more information由于您正在处理html数据,因此您应该尝试使用pandas.read_html()的不同方法,可以在此处阅读以获取更多信息

Fix your code by通过以下方式修复您的代码

pd.DataFrame([[data.text.split(":")[0],data.text.split(":")[1]]])

I did some more research, the best way for me to do it was:我做了一些更多的研究,对我来说最好的方法是:

#get data from marketwatch

url = 'https://oldschool.runescape.wiki/w/Module:Exchange/Anglerfish/Data'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
price_data = soup.find_all('span', class_='s1')
df = pd.DataFrame(columns=['timestamp', 'price'])

for data in price_data:
  df = df.append({'timestamp': data.text.split(":")[0], 'price': data.text.split(":")[1]}, ignore_index=True)

print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM