I have a DataFrame from Pandas:
import pandas as pd
data = [{'c1':'aaa', 'c2':100, 'c3': 99, 'c4': 0}, {'c1':'bbb','c2':110, 'c3': 89, 'c4': 0},
{'c1':'aaa','c2':NaN,'c3': 93, 'c4': 0},{'c1':'ccc', 'c2':130,'c3': 77, 'c4': 0},
{'c1':'ddd','c2':140,'c3': 54, 'c4': 0}, {'c1':'bbb','c2':NaN,'c3': 76, 'c4': 0},
{'c1':'ddd', 'c2':NaN,'c3': 75, 'c4': 0}]
df = pd.DataFrame(data)
print df
Output:
c1 c2 c3 c4
0 'aaa' 100 99 0
1 'bbb' 110 89 0
2 'aaa' 100 93 0
3 'ccc' 130 77 0
4 'ddd' 140 54 0
5 'bbb' 110 76 0
6 'ddd' 140 75 0
Now, I want for every row that matches the column c1, set the column c4 equals than the column c2 of the another row that matches the first field. The result:
c1 c2 c3 c4
0 'aaa' 100 99 0
1 'bbb' 110 89 0
2 'aaa' 100 93 100
3 'ccc' 130 77 0
4 'ddd' 140 54 0
5 'bbb' 110 76 110
6 'ddd' 140 75 140
This dataframe is an example, the real dataframe has more columns and much more rows (around 4 million). My initial idea was this:
for index, row in df.iterrows():
df[df.c1==row.c1].iloc[1].c4= row.c2
There can only be another matching row. Obviously, using iterrows the process is extremely slow.
Based on your latest edit,you can fillna with df.groupby
followed by shift which will shift values 1 row down following the group:
df['c4'] = df.groupby("c1")['c2'].shift().fillna(df['c4'])
c1 c2 c3 c4
0 'aaa' 100 99 0.0
1 'bbb' 110 89 0.0
2 'aaa' 100 93 100.0
3 'ccc' 130 77 0.0
4 'ddd' 140 54 0.0
5 'bbb' 110 76 110.0
6 'ddd' 140 75 140.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.