简体   繁体   English

如何通过for循环填充熊猫数据框?

[英]How to fill a pandas dataframe through a for loop?

Question is related to this function in particular 问题尤其与此功能有关

def jsontodataframe(): #collect OHLC data from scstrade

    companies = {'Habib Bank Limited':'HBL','Engro Chemical':'ENGRO'}
    url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotHP.aspx/chart'

    payload = {"date1":"01/01/2019","date2":"06/01/2019","rows":20,"page":1,"sidx":"trading_Date",
    "sord":"desc"}

    for company in companies:
        payload["par"] = companies[company]
        #print(payload)
        json_data = requests.post(url, json=payload).json() #download the json POST request from scstrade
        json_normalize(json_data)
        df = pd.DataFrame(json_data) #convert the json to pandas dataframe
        df = pd.io.json.json_normalize(json_data['d'], errors='ignore')
        df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Change'] #rename the columns to better names

        df['Date'] = df['Date'].str.strip('/Date()')
        df['Date'] = pd.to_datetime(df['Date'], origin='unix', unit='ms') #convert unix timestamp to pandas datetime and set the index
        df['ID'] = companies[company]
        df.set_index(['ID'], inplace=True)
        print(df.head())

    df.to_csv("OHLC_values.csv") #save .csv file for later usage

Currently the df variable keeps getting overwritten each time and my output is like this: 目前, df变量每次都会被覆盖,而我的输出是这样的: 我的输出图片

I thought about using append but that would be a massive performance hit and I want the code to be as efficient as possible (so that I can easily scale it later on). 我考虑过使用append,但这会严重影响性能,并且我希望代码尽可能高效(以便以后可以轻松扩展)。 Right now the df.columns line is redundant so should I just define my dataframe outside of the for loop? 现在df.columns行是多余的,所以我应该只在for循环之外定义数据帧吗? But that json_normalize function will bring in column names of its own then so thats kinda necessary. 但是那个json_normalize函数将引入自己的列名,所以这是必要的。

Ideally I just want one big dataframe and then later convert that to one .csv file 理想情况下,我只想要一个大数据框,然后将其转换为一个.csv文件

Ideally I just want one big dataframe and then later convert that to one .csv file 理想情况下,我只想要一个大数据框,然后将其转换为一个.csv文件

This could be achieved using pandas.concat 这可以使用pandas.concat实现

import calendar, requests
import pandas as pd
from pandas.io.json import json_normalize

def jsontodataframe():

    companies = {'Habib Bank Limited':'HBL','Engro Chemical':'ENGRO'}
    url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotHP.aspx/chart'

    payload = {"date1":"01/01/2019","date2":"06/01/2019","rows":20,"page":1,"sidx":"trading_Date",
    "sord":"desc"}
    data = []
    for company in companies:
        payload["par"] = companies[company]

        json_data = requests.post(url, json=payload).json() 
        json_normalize(json_data)
        df = pd.DataFrame(json_data) 
        df = pd.io.json.json_normalize(json_data['d'], errors='ignore')
        df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Change'] 

        df['Date'] = df['Date'].str.strip('/Date()')
        df['Date'] = pd.to_datetime(df['Date'], origin='unix', unit='ms') 

        df['Date'] = df['Date'].dt.floor('d') # return only dates

        ### update
        df['Month_as_number'] = df['Date'].dt.month # created a column with a month as number - 5, 11 etc.
        df['Month_as_name'] = df['Month_as_number'].apply(lambda x: calendar.month_abbr[x]) # 5 as May etc
        ###

        df['ID'] = companies[company]
        df.set_index(['ID'], inplace=True)
        data.append(df)
    # save to csv instead of returning dataframe    
    pd.concat(data).to_csv('OHLC_values.csv', index=False)

So I edited my initial answer. 所以我编辑了最初的答案。 Now the function saves dataframe to .csv file. 现在,该功能将数据帧保存到.csv文件。 Also, I stripped date column to only dates. 另外,我将date列剥离为仅日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM