简体   繁体   中英

Pandas if statement in vectorized operation

df = pd.DataFrame([["a", "d"], ["", ""], ["", "3"]],
            columns=["a", "b"])
df
    a   b
0   a   d
1       
2       3

I'm looking to do a vectorized string concatenation with an if statement like this:

df["c"] = df["a"] + "()" + df["b"] if df["a"].item != "" else ""

But it doesn't work because .item returns a series. Is it possible to do it like this without an apply or lambda method that goes through each row? In a vectorized operation pandas will try and concatenate multiple cells at a time and make it faster...

Desired output:

df
    a   b   c
0   a   d   a ()b
1           
2       3

Try this: using np.where()

df = pd.DataFrame([["a", "d"], ["", ""], ["", "3"]],
            columns=["a", "b"])

df['c']=np.where(df['a']!='',df['a'] + '()' + df['b'],'')
print(df)

output:

   a  b     c
0  a  d  a()d
1            
2     3      

IIUC you could use mask to concatenate both columns, separated by some string using str.cat , whenever a condition holds:

df['c'] = df.a.mask(df.a.ne(''), df.a.str.cat(df.b, sep='()'))

print(df)

   a  b    c
0  a  d  a()d
1            
2     3 

Since nobody already mentioned it, you can also use the apply method:

df['c'] = df.apply(lambda r: r['a']+'()'+r['b'] if r['a']!='' else '', axis=1)

If anyone checks performances please comment below :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM