简体   繁体   English

如何将 Pandas Dataframe 中某些列的非空值填充到新列中? 如何在多个条件下使用 np.where()?

[英]How to fill Non-Null values from some columns in Pandas Dataframe into a new column? How to use np.where() for multiple conditions?

I have a question regards about np.where()我有一个关于 np.where() 的问题

Currently, I have 2 columns, each column contains Null values and categorical values.目前,我有 2 列,每列包含 Null 个值和分类值。 Values from each column are distinct and will not overlap.每列的值是不同的,不会重叠。

For now, I want to apply all the Non-Null values from these 2 columns into the new column and fill the NaN value in the new column as a categorical value.现在,我想将这两列中的所有非空值应用到新列中,并将新列中的 NaN 值填充为分类值。

My idea is using np.where()我的想法是使用 np.where()

df['C']=np.where(df['A']=='user1', 'user1',(df['B']=='user2','user2','user3'))

Basic idea is if df['A']=='A', fill the value A into new column fist, elif df['B']=='B', fill the value B into new column as well, Else fill the value 'C' for all the NaN values.基本思想是如果 df['A']=='A',将值 A 填充到新列拳头,elif df['B']=='B',将值 B 也填充到新列,否则填充所有 NaN 值的值“C”。

However, a syntax error returned.但是,返回语法错误。

ValueError: operands could not be broadcast together with shapes (544,) () (3,) 

Thanks for the help always!感谢您一直以来的帮助!

Sample data:样本数据:

A   B   C   Desired col C
user1   Null    Null    user1
user1   Null    Null    user1
user1   Null    Null    user1
user1   Null    Null    user1
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    user2   Null    user2
Null    Null    Null    user3
Null    Null    Null    user3
Null    Null    Null    user3
Null    Null    Null    user3

Assuming your initial df is only cols A, B, and C:假设您的初始 df 只有列 A、B 和 C:

# convert value you don't want to NaNs
df = df.where(df != 'Null')

# temporary list
lst = []

# iterate row-wise
for r in df.iterrows():
    # test if all values in row are the same (1 = no)
    if r[1].nunique() == 1:
        # if different, find the one that is the string and append to list
        a,b,c = r[1] # *this is specific to your example with three cols*
        for i in [a,b,c]:
            if isinstance(i,str):
                lst.append(i)
    else:
        # if same append specified value to list
        lst.append('user3')

df['D'] = lst

It's verbose and will be bit slow for very large dfs, but it produces your expected result.它很冗长,对于非常大的 dfs 会有点慢,但它会产生您预期的结果。 And it's readable.它是可读的。

It would be cleaner if you didn't have the rows with all nulls.如果您没有包含所有空值的行,它会更干净。 Then a cleaner, one-liner would be more feasible df.where(), .apply(lambda), or masked array approach easier.那么更清晰的单行代码将更可行 df.where()、.apply(lambda) 或更容易的掩码数组方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM