简体   繁体   中英

How to fill Non-Null values from some columns in Pandas Dataframe into a new column? How to use np.where() for multiple conditions?

I have a question regards about np.where()

Currently, I have 2 columns, each column contains Null values and categorical values. Values from each column are distinct and will not overlap.

For now, I want to apply all the Non-Null values from these 2 columns into the new column and fill the NaN value in the new column as a categorical value.

My idea is using np.where()

df['C']=np.where(df['A']=='user1', 'user1',(df['B']=='user2','user2','user3'))

Basic idea is if df['A']=='A', fill the value A into new column fist, elif df['B']=='B', fill the value B into new column as well, Else fill the value 'C' for all the NaN values.

However, a syntax error returned.

ValueError: operands could not be broadcast together with shapes (544,) () (3,) 

Thanks for the help always!

Sample data:

A   B   C   Desired col C
user1   Null    Null    user1
user1   Null    Null    user1
user1   Null    Null    user1
user1   Null    Null    user1
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    Null    Null    user3
Null    Null    Null    user3
Null    Null    Null    user3
Null    Null    Null    user3

Assuming your initial df is only cols A, B, and C:

# convert value you don't want to NaNs
df = df.where(df != 'Null')

# temporary list
lst = []

# iterate row-wise
for r in df.iterrows():
    # test if all values in row are the same (1 = no)
    if r[1].nunique() == 1:
        # if different, find the one that is the string and append to list
        a,b,c = r[1] # *this is specific to your example with three cols*
        for i in [a,b,c]:
            if isinstance(i,str):
                lst.append(i)
    else:
        # if same append specified value to list
        lst.append('user3')

df['D'] = lst

It's verbose and will be bit slow for very large dfs, but it produces your expected result. And it's readable.

It would be cleaner if you didn't have the rows with all nulls. Then a cleaner, one-liner would be more feasible df.where(), .apply(lambda), or masked array approach easier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM