I am having a dataframe with more than 50 000 rows.
I am trying to iterate over rows until the first empty cell and then copy the value of the id only if par exists. Unfortunately, this does not work.
In other words, what I aim to do is replace par with the value of the column1 ( id_1, id_2, id_3 ) to the column3 respectively. The output would be something like :
You can use .where()
and ffill()
to fillup each group of same id
with the id
values in each row in Column1
. Then change Column3
values with par
to these id
values in the same row by using .mask()
, as follows:
Col1_id = df['Column1'].where(df['Column1'].str.startswith('id_')).ffill()
df['Column3'] = df['Column3'].mask(df['Column3'] == 'par', Col1_id)
Result:
print(df)
Column1 Column2 Column3
0 id_1 NaN NaN
1 n="1" whose id_1
2 n="2" theirs id_1
3 n="3" am id_1
4 NaN NaN NaN
5 id_2 NaN NaN
6 n="4" in id_2
7 n="5" out id_2
8 NaN NaN NaN
9 id_3 NaN NaN
10 n="6" in id_3
11 n="7" out id_3
numpy.split
.str.replace
import numpy as np
output = pd.DataFrame()
for chunk in np.split(df, df[df.isnull().all(1)].index):
if "par" in chunk["Column3"].tolist():
chunk["Column3"] = chunk["Column3"].str.replace("par", chunk["Column1"].dropna().iat[0])
output = output.append(chunk)
>>> output
Column1 Column2 Column3
0 id_1 None None
1 n="1" whose id_1
2 n="2" theirs id_1
3 n="3" am id_1
4 None None None
5 id_2 None None
6 n="4" in id_2
7 n="5" out id_2
8 None None None
9 id_3 None None
10 n="6" in id_3
11 n="7" out id_3
You could do like that:
import numpy as np
df['Column3'] = np.where(df['Column3'].eq('par'),
df['Column1'].where(df['Column1'].str.contains('id')).ffill(),
np.nan)
Output:
Column1 Column2 Column3
0 id_1 NaN NaN
1 n="1" whose id_1
2 n="2" theirs id_1
3 n="3" am id_1
4 id_2 NaN NaN
5 n="4" in id_2
6 n="5" out id_2
7 id_3 NaN NaN
8 n="6" in id_3
9 n="7" out id_3
Edit:
If there are also other values than 'par' in Column3, you can put df['Column3'] instead of np.nan at the end of the function. It's your decision if you wanna put 'nan' in that column for values other than 'par' or leave values which already exists there.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.