Pandas - iterate over rows in specific column and find empty cells

Question

I am having a dataframe with more than 50 000 rows.

input :

I am trying to iterate over rows until the first empty cell and then copy the value of the id only if par exists. Unfortunately, this does not work.

In other words, what I aim to do is replace par with the value of the column1 ( id_1, id_2, id_3 ) to the column3 respectively. The output would be something like :

Answer 1

You can use .where() and ffill() to fillup each group of same id with the id values in each row in Column1 . Then change Column3 values with par to these id values in the same row by using .mask() , as follows:

Col1_id = df['Column1'].where(df['Column1'].str.startswith('id_')).ffill()

df['Column3'] = df['Column3'].mask(df['Column3'] == 'par', Col1_id)

Result:

print(df)

   Column1 Column2 Column3
0     id_1     NaN     NaN
1    n="1"   whose    id_1
2    n="2"  theirs    id_1
3    n="3"      am    id_1
4      NaN     NaN     NaN
5     id_2     NaN     NaN
6    n="4"      in    id_2
7    n="5"     out    id_2
8      NaN     NaN     NaN
9     id_3     NaN     NaN
10   n="6"      in    id_3
11   n="7"     out    id_3

Answer 2

Split the DataFrame on empty rows using numpy.split .
If "par" is in Column3, replace all "par" in Column3 with the first non-null value from "Column 1" using str.replace
Append the modified "chunk" to the output.

Solution:

import numpy as np

output = pd.DataFrame()
for chunk in np.split(df, df[df.isnull().all(1)].index):
    if "par" in chunk["Column3"].tolist():
        chunk["Column3"] = chunk["Column3"].str.replace("par", chunk["Column1"].dropna().iat[0])
    output = output.append(chunk)

>>> output
   Column1 Column2 Column3
0     id_1    None    None
1    n="1"   whose    id_1
2    n="2"  theirs    id_1
3    n="3"      am    id_1
4     None    None    None
5     id_2    None    None
6    n="4"      in    id_2
7    n="5"     out    id_2
8     None    None    None
9     id_3    None    None
10   n="6"      in    id_3
11   n="7"     out    id_3

Answer 3

You could do like that:

import numpy as np

df['Column3'] = np.where(df['Column3'].eq('par'),
                         df['Column1'].where(df['Column1'].str.contains('id')).ffill(),
                         np.nan)

Output:

    Column1  Column2  Column3
0      id_1      NaN      NaN
1     n="1"    whose     id_1
2     n="2"   theirs     id_1
3     n="3"       am     id_1
4      id_2      NaN      NaN
5     n="4"       in     id_2
6     n="5"      out     id_2
7      id_3      NaN      NaN
8     n="6"       in     id_3
9     n="7"      out     id_3

Edit:

If there are also other values than 'par' in Column3, you can put df['Column3'] instead of np.nan at the end of the function. It's your decision if you wanna put 'nan' in that column for values other than 'par' or leave values which already exists there.

Pandas - iterate over rows in specific column and find empty cells

Question

3 answers

solution1
2 ACCPTED 2021-08-16 19:34:00

solution2
1 2021-08-16 19:19:15

Solution:

solution3
1 2021-08-16 19:29:59

Pandas - iterate over rows in specific column and find empty cells

Question

3 answers

solution1 2 ACCPTED 2021-08-16 19:34:00

solution2 1 2021-08-16 19:19:15

Solution:

solution3 1 2021-08-16 19:29:59

solution1
2 ACCPTED 2021-08-16 19:34:00

solution2
1 2021-08-16 19:19:15

solution3
1 2021-08-16 19:29:59