简体   繁体   中英

Set values in a new column based on a boolean condition

I have a data frame and two dictionaries as follows:

a = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
x = {'a':'a'}
y = {'b':'b'}

Now I would like to perform an operation that adds a new column C such that each cell in C stores x when A >=2 and B >= 2, and stores y otherwise. The resulting data frame should be equivalent to:

a = pd.DataFrame({'A':[1,2,3],'B':[4,5,6], 'C':[{'b':'b'}, {'a':'a'}, {'a':'a'}]})

I have tried many different approaches and nothing has worked so far. This is a toy example, while the real data frame will have many rows and columns and more complex conditions might be used. The end goal is to prepare a data frame for visualization with plotly by storing all the necessary info for visualization (such as marker definitions) as additional columns.

Thanks in advance.

Using np.where

a['C'] = np.where((a.A >= 2) & (a.B >= 2), x, y)

   A  B           C
0  1  4  {'b': 'b'}
1  2  5  {'a': 'a'}
2  3  6  {'a': 'a'}

To explain this a bit since you say your real data is more complex, np.where will:

Return elements, either from x or y, depending on condition

So simply create your condition, and then identify what x and y need to be based on the result of the condition. If you have more than two possible options, and multiple conditions, then you should look at np.select

Here is the equivalent np.select for the sake of demonstration:

conds = [(a.A >=2) & (a.B >=2)]
choices = [x]

np.select(conds, choices, default=y)
# array([{'b': 'b'}, {'a': 'a'}, {'a': 'a'}], dtype=object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM