简体   繁体   中英

Pandas - iterate over rows in specific column and find empty cells

I am having a dataframe with more than 50 000 rows.

input : 在此处输入图像描述

I am trying to iterate over rows until the first empty cell and then copy the value of the id only if par exists. Unfortunately, this does not work.

In other words, what I aim to do is replace par with the value of the column1 ( id_1, id_2, id_3 ) to the column3 respectively. The output would be something like :

在此处输入图像描述

You can use .where() and ffill() to fillup each group of same id with the id values in each row in Column1 . Then change Column3 values with par to these id values in the same row by using .mask() , as follows:

Col1_id = df['Column1'].where(df['Column1'].str.startswith('id_')).ffill()

df['Column3'] = df['Column3'].mask(df['Column3'] == 'par', Col1_id)

Result:

print(df)

   Column1 Column2 Column3
0     id_1     NaN     NaN
1    n="1"   whose    id_1
2    n="2"  theirs    id_1
3    n="3"      am    id_1
4      NaN     NaN     NaN
5     id_2     NaN     NaN
6    n="4"      in    id_2
7    n="5"     out    id_2
8      NaN     NaN     NaN
9     id_3     NaN     NaN
10   n="6"      in    id_3
11   n="7"     out    id_3
  1. Split the DataFrame on empty rows using numpy.split .
  2. If "par" is in Column3, replace all "par" in Column3 with the first non-null value from "Column 1" using str.replace
  3. Append the modified "chunk" to the output.

Solution:

import numpy as np

output = pd.DataFrame()
for chunk in np.split(df, df[df.isnull().all(1)].index):
    if "par" in chunk["Column3"].tolist():
        chunk["Column3"] = chunk["Column3"].str.replace("par", chunk["Column1"].dropna().iat[0])
    output = output.append(chunk)

>>> output
   Column1 Column2 Column3
0     id_1    None    None
1    n="1"   whose    id_1
2    n="2"  theirs    id_1
3    n="3"      am    id_1
4     None    None    None
5     id_2    None    None
6    n="4"      in    id_2
7    n="5"     out    id_2
8     None    None    None
9     id_3    None    None
10   n="6"      in    id_3
11   n="7"     out    id_3

You could do like that:

import numpy as np

df['Column3'] = np.where(df['Column3'].eq('par'),
                         df['Column1'].where(df['Column1'].str.contains('id')).ffill(),
                         np.nan)

Output:

    Column1  Column2  Column3
0      id_1      NaN      NaN
1     n="1"    whose     id_1
2     n="2"   theirs     id_1
3     n="3"       am     id_1
4      id_2      NaN      NaN
5     n="4"       in     id_2
6     n="5"      out     id_2
7      id_3      NaN      NaN
8     n="6"       in     id_3
9     n="7"      out     id_3

Edit:

If there are also other values than 'par' in Column3, you can put df['Column3'] instead of np.nan at the end of the function. It's your decision if you wanna put 'nan' in that column for values other than 'par' or leave values which already exists there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM