[英]How to fill a pandas dataframe through a for loop?
Question is related to this function in particular 问题尤其与此功能有关
def jsontodataframe(): #collect OHLC data from scstrade
companies = {'Habib Bank Limited':'HBL','Engro Chemical':'ENGRO'}
url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotHP.aspx/chart'
payload = {"date1":"01/01/2019","date2":"06/01/2019","rows":20,"page":1,"sidx":"trading_Date",
"sord":"desc"}
for company in companies:
payload["par"] = companies[company]
#print(payload)
json_data = requests.post(url, json=payload).json() #download the json POST request from scstrade
json_normalize(json_data)
df = pd.DataFrame(json_data) #convert the json to pandas dataframe
df = pd.io.json.json_normalize(json_data['d'], errors='ignore')
df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Change'] #rename the columns to better names
df['Date'] = df['Date'].str.strip('/Date()')
df['Date'] = pd.to_datetime(df['Date'], origin='unix', unit='ms') #convert unix timestamp to pandas datetime and set the index
df['ID'] = companies[company]
df.set_index(['ID'], inplace=True)
print(df.head())
df.to_csv("OHLC_values.csv") #save .csv file for later usage
Currently the df
variable keeps getting overwritten each time and my output is like this: 目前,
df
变量每次都会被覆盖,而我的输出是这样的:
I thought about using append but that would be a massive performance hit and I want the code to be as efficient as possible (so that I can easily scale it later on). 我考虑过使用append,但这会严重影响性能,并且我希望代码尽可能高效(以便以后可以轻松扩展)。 Right now the df.columns line is redundant so should I just define my dataframe outside of the for loop?
现在df.columns行是多余的,所以我应该只在for循环之外定义数据帧吗? But that json_normalize function will bring in column names of its own then so thats kinda necessary.
但是那个json_normalize函数将引入自己的列名,所以这是必要的。
Ideally I just want one big dataframe and then later convert that to one .csv file 理想情况下,我只想要一个大数据框,然后将其转换为一个.csv文件
Ideally I just want one big dataframe and then later convert that to one .csv file
理想情况下,我只想要一个大数据框,然后将其转换为一个.csv文件
This could be achieved using pandas.concat
这可以使用
pandas.concat
实现
import calendar, requests
import pandas as pd
from pandas.io.json import json_normalize
def jsontodataframe():
companies = {'Habib Bank Limited':'HBL','Engro Chemical':'ENGRO'}
url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotHP.aspx/chart'
payload = {"date1":"01/01/2019","date2":"06/01/2019","rows":20,"page":1,"sidx":"trading_Date",
"sord":"desc"}
data = []
for company in companies:
payload["par"] = companies[company]
json_data = requests.post(url, json=payload).json()
json_normalize(json_data)
df = pd.DataFrame(json_data)
df = pd.io.json.json_normalize(json_data['d'], errors='ignore')
df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Change']
df['Date'] = df['Date'].str.strip('/Date()')
df['Date'] = pd.to_datetime(df['Date'], origin='unix', unit='ms')
df['Date'] = df['Date'].dt.floor('d') # return only dates
### update
df['Month_as_number'] = df['Date'].dt.month # created a column with a month as number - 5, 11 etc.
df['Month_as_name'] = df['Month_as_number'].apply(lambda x: calendar.month_abbr[x]) # 5 as May etc
###
df['ID'] = companies[company]
df.set_index(['ID'], inplace=True)
data.append(df)
# save to csv instead of returning dataframe
pd.concat(data).to_csv('OHLC_values.csv', index=False)
So I edited my initial answer. 所以我编辑了最初的答案。 Now the function saves dataframe to
.csv
file. 现在,该功能将数据帧保存到
.csv
文件。 Also, I stripped date
column to only dates. 另外,我将
date
列剥离为仅日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.