简体   繁体   中英

Create a new Pandas df column with boolean values that depend on another column

I need to add a new column to a Pandas dataframe.

If the column "Inducing" contains text (not empty and not "") I need to add a 1 otherwise 0

I tried with

df['newColumn'] = np.where(df['INDUCING']!="", 1, 0)

This command works only for the values that are Strings initiated as "" but does not work if it is null.

Any idea on how to add this column correctly?

By De Morgan's laws , NOT(cond1 OR cond2) is equivalent to AND(NOT(cond1) AND NOT(cond2)).

You can combine conditions via the bitwise "and" ( & ) / "or" ( | ) operators as appropriate. This gives a Boolean series, which you can then cast to int :

df['newColumn'] = (df['INDUCING'].ne('') & df['INDUCING'].notnull()).astype(int)

Easiest way would be to .fillna('') first. Correction:

df['newColumn'] = np.where(df['INDUCING'].fillna('') != "", 1, 0)

or pass .astype(int) directly to the mask. This converts True to 1 and False to 0:

df['newcol'] = (df['INDUCING'].fillna('') != '').astype(int)

As the built-in bool produces True on a string exactly if it is non-empty, you can achieve this simply through

df['newColumn'] = df['INDUCING'].astype(bool).astype(int)

Some performance comparisons:

In [61]: df = pd.DataFrame({'INDUCING': ['test', None, '', 'more test']*10000})

In [63]: %timeit np.where(df['INDUCING'].fillna('') != "", 1, 0)
5.68 ms ± 500 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [62]: %timeit (df['INDUCING'].ne('') & df['INDUCING'].notnull()).astype(int)
5.1 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [64]: %timeit np.where(df['INDUCING'], 1, 0)
667 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [65]: %timeit df['INDUCING'].astype(bool).astype(int)
655 µs ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [99]: %timeit df['INDUCING'].values.astype(bool).astype(int)
553 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM