简体   繁体   中英

How to iterate faster over rows in a DataFrame?

I have a DataFrame from Pandas:

import pandas as pd
data = [{'c1':'aaa', 'c2':100, 'c3': 99, 'c4': 0}, {'c1':'bbb','c2':110, 'c3': 89, 'c4': 0}, 
{'c1':'aaa','c2':NaN,'c3': 93, 'c4': 0},{'c1':'ccc', 'c2':130,'c3': 77, 'c4': 0}, 
{'c1':'ddd','c2':140,'c3': 54, 'c4': 0}, {'c1':'bbb','c2':NaN,'c3': 76, 'c4': 0},
{'c1':'ddd', 'c2':NaN,'c3': 75, 'c4': 0}]
df = pd.DataFrame(data)
print df

Output:

   c1    c2   c3  c4
0 'aaa'  100  99  0
1 'bbb'  110  89  0
2 'aaa'  100  93  0
3 'ccc'  130  77  0
4 'ddd'  140  54  0
5 'bbb'  110  76  0
6 'ddd'  140  75  0

Now, I want for every row that matches the column c1, set the column c4 equals than the column c2 of the another row that matches the first field. The result:

   c1    c2   c3  c4
0 'aaa'  100  99  0
1 'bbb'  110  89  0
2 'aaa'  100  93  100
3 'ccc'  130  77  0
4 'ddd'  140  54  0
5 'bbb'  110  76  110
6 'ddd'  140  75  140

This dataframe is an example, the real dataframe has more columns and much more rows (around 4 million). My initial idea was this:

for index, row in df.iterrows(): 
    df[df.c1==row.c1].iloc[1].c4= row.c2

There can only be another matching row. Obviously, using iterrows the process is extremely slow.

Based on your latest edit,you can fillna with df.groupby followed by shift which will shift values 1 row down following the group:

df['c4'] = df.groupby("c1")['c2'].shift().fillna(df['c4'])

      c1   c2  c3     c4
0  'aaa'  100  99    0.0
1  'bbb'  110  89    0.0
2  'aaa'  100  93  100.0
3  'ccc'  130  77    0.0
4  'ddd'  140  54    0.0
5  'bbb'  110  76  110.0
6  'ddd'  140  75  140.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM