[英]How to fill a pandas dataframe through a for loop?
問題尤其與此功能有關
def jsontodataframe(): #collect OHLC data from scstrade
companies = {'Habib Bank Limited':'HBL','Engro Chemical':'ENGRO'}
url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotHP.aspx/chart'
payload = {"date1":"01/01/2019","date2":"06/01/2019","rows":20,"page":1,"sidx":"trading_Date",
"sord":"desc"}
for company in companies:
payload["par"] = companies[company]
#print(payload)
json_data = requests.post(url, json=payload).json() #download the json POST request from scstrade
json_normalize(json_data)
df = pd.DataFrame(json_data) #convert the json to pandas dataframe
df = pd.io.json.json_normalize(json_data['d'], errors='ignore')
df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Change'] #rename the columns to better names
df['Date'] = df['Date'].str.strip('/Date()')
df['Date'] = pd.to_datetime(df['Date'], origin='unix', unit='ms') #convert unix timestamp to pandas datetime and set the index
df['ID'] = companies[company]
df.set_index(['ID'], inplace=True)
print(df.head())
df.to_csv("OHLC_values.csv") #save .csv file for later usage
我考慮過使用append,但這會嚴重影響性能,並且我希望代碼盡可能高效(以便以后可以輕松擴展)。 現在df.columns行是多余的,所以我應該只在for循環之外定義數據幀嗎? 但是那個json_normalize函數將引入自己的列名,所以這是必要的。
理想情況下,我只想要一個大數據框,然后將其轉換為一個.csv文件
理想情況下,我只想要一個大數據框,然后將其轉換為一個.csv文件
這可以使用pandas.concat
實現
import calendar, requests
import pandas as pd
from pandas.io.json import json_normalize
def jsontodataframe():
companies = {'Habib Bank Limited':'HBL','Engro Chemical':'ENGRO'}
url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotHP.aspx/chart'
payload = {"date1":"01/01/2019","date2":"06/01/2019","rows":20,"page":1,"sidx":"trading_Date",
"sord":"desc"}
data = []
for company in companies:
payload["par"] = companies[company]
json_data = requests.post(url, json=payload).json()
json_normalize(json_data)
df = pd.DataFrame(json_data)
df = pd.io.json.json_normalize(json_data['d'], errors='ignore')
df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Change']
df['Date'] = df['Date'].str.strip('/Date()')
df['Date'] = pd.to_datetime(df['Date'], origin='unix', unit='ms')
df['Date'] = df['Date'].dt.floor('d') # return only dates
### update
df['Month_as_number'] = df['Date'].dt.month # created a column with a month as number - 5, 11 etc.
df['Month_as_name'] = df['Month_as_number'].apply(lambda x: calendar.month_abbr[x]) # 5 as May etc
###
df['ID'] = companies[company]
df.set_index(['ID'], inplace=True)
data.append(df)
# save to csv instead of returning dataframe
pd.concat(data).to_csv('OHLC_values.csv', index=False)
所以我編輯了最初的答案。 現在,該功能將數據幀保存到.csv
文件。 另外,我將date
列剝離為僅日期。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.