简体   繁体   中英

Python growing dictionary or growing dataframe - appending in a loop

I'm trying to write code that collects data from a source online in a loop and manipulates this data with pandas inside each iteration. Initially I was thinking that I should initialise a dict outside of the loop, grab the data, convert the dict to a dataframe inside the loop, and perform my operations on that. But this feels quite strange to make the dictionary instead of just making a dataframe and append to that in the loop. But as I understand it, pandas is not really "designed" for cell-by-cell updating (rather vectorwise). What would be the most efficient approach to this?

import pandas as pd
    d = {'a':[], 'b':[], 'c':[], 'x':[], 'z':[]}
    for i in range(100):
        d['a'].append(f'some info {i}')
        d['b'].append(f'more info {i}')
        d['c'].append(i)
        d['x'].append(i*2)
        d['z'].append(np.nan) # ???

        df = pd.DataFrame(d)
        # Some function that does calculations on df cols and returns df with new cols
        df['z'] = 1 

Pandas is normally used to perform data manipulation and data modelling so it might be inefficient to add data every time in the loop to the dataframe. Note this would depend heavily on the number of iterations in the loop. if they are very few compared to the final length of dataframe, you can of course do that. Otherwise, it seems best to get all the data in the dictionary inside the loop, and when you are done collecting the data, you could convert that into dataframe for analysis and delete the dictionary then

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM