Python growing dictionary or growing dataframe - appending in a loop

Question

I'm trying to write code that collects data from a source online in a loop and manipulates this data with pandas inside each iteration. Initially I was thinking that I should initialise a dict outside of the loop, grab the data, convert the dict to a dataframe inside the loop, and perform my operations on that. But this feels quite strange to make the dictionary instead of just making a dataframe and append to that in the loop. But as I understand it, pandas is not really "designed" for cell-by-cell updating (rather vectorwise). What would be the most efficient approach to this?

import pandas as pd
    d = {'a':[], 'b':[], 'c':[], 'x':[], 'z':[]}
    for i in range(100):
        d['a'].append(f'some info {i}')
        d['b'].append(f'more info {i}')
        d['c'].append(i)
        d['x'].append(i*2)
        d['z'].append(np.nan) # ???

        df = pd.DataFrame(d)
        # Some function that does calculations on df cols and returns df with new cols
        df['z'] = 1

Answer 1

Pandas is normally used to perform data manipulation and data modelling so it might be inefficient to add data every time in the loop to the dataframe. Note this would depend heavily on the number of iterations in the loop. if they are very few compared to the final length of dataframe, you can of course do that. Otherwise, it seems best to get all the data in the dictionary inside the loop, and when you are done collecting the data, you could convert that into dataframe for analysis and delete the dictionary then

Python growing dictionary or growing dataframe - appending in a loop

Question

1 answers

solution1
0 2019-08-08 22:21:48

Python growing dictionary or growing dataframe - appending in a loop

Question

1 answers

solution1 0 2019-08-08 22:21:48

solution1
0 2019-08-08 22:21:48