简体   繁体   中英

Pandas Dataframe iterating over rows

Here is the dataset

import pandas as pd
d = {'Key':['A','A','A','A'],'Rank':[1,2,3,4],'col1': [15000,12000,6000,7000], 'col2': [15000,10000,0,0],'col4': [10000,10000,10000,10000],'col5': [0,0,0,0] }
df = pd.DataFrame(data=d)
df

在此处输入图片说明

  • Col1= Max values it can take
  • Col2=Current value it holds
  • Col4:Remaining value that should fit in any of these records.

I am trying to fill in the 'col5' with possible max value that it can take.Where 'Col1' defines its maximum limit and 'col2' shows its current value. If it fits max value then move to the next row. The value that it can fit is determined by 'col4'. Please see below example.

Example:

  • first record with rank 1 Col1=15000 and col2=15000 then move to next row.
  • second record with rank2 col1=12000 and col2=10000. Here we can see that its max is 12000 so I can add 2000 more, also need to make sure col5>2000 so col5=2000 and col4 for next record will be 10000-2000=8000

Here is the end dataset which should look like

在此处输入图片说明

Below is the code which I have tried

for index, row in df.iterrows():
    #print(row['col1'], row['col2'])
    if row['col1']>row['col2']:
        
        if (row['col1']-row['col2'])<row['col2']:
            row['col5']=row['col1']-row['col2']
        else:
            row['col5']=row['col2']
    #return
    print(row['col1'], row['col2'],row['col5'])

this should do your stuff (Updated with multiple keys):

import pandas as pd

d = {'Key': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Rank': [1, 2, 3, 4, 1, 2, 3, 4],
 'col1': [15000, 12000, 6000, 7000, 15000, 12000, 6000, 7000], 'col2': [15000, 10000, 0, 0, 15000, 10000, 0, 0],
 'col4': [10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], 'col5': [0, 0, 0, 0, 0, 0, 0, 0]}
df = pd.DataFrame(data=d)
print(df)

df_result = pd.DataFrame()

for group in df.groupby(df.Key):
    tmp_value = 0
    df_tmp = group[1]
    for index, row in df_tmp.iterrows():
        if tmp_value == 0:
            tmp_value = row['col4']
        # print(row['col1'], row['col2'])
        if row['col1'] > row['col2']:
            diff_value = row['col1'] - row['col2']
            if diff_value < tmp_value:
                df_tmp.at[index, 'col5'] = row['col1'] - row['col2']
                tmp_value = tmp_value - diff_value
            else:
                df_tmp.at[index, 'col5'] = tmp_value
                break
    df_result = df_result.append(df_tmp)

print(df_result)

A few hints:
The tmp_value holds the data from col 4 to decrease over time.
you should break with break , not with exit in my mind
Here you can read about editing panda rows during iterating over it: Update a dataframe in pandas while iterating row by row .
edit: You also can get the key data first and save the 'col4'-data in an array and change the original dataframe directly, but thats up to you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM