简体   繁体   English

如何根据多个条件在df中创建新列?

[英]How to create new column in a df based on multiple conditions?

I have a df with 3 columns: v1, v2, v3;where 我有一个包含3列的df:v1,v2,v3;其中

v1=[a,b,c,a] 
v2=[d,d,f,n] 
v3=[a,k,i,j] 

What I like to do is to create new columns based on conditions in column v1~v3. 我喜欢做的是根据第v1~v3列中的条件创建新列。

I can do single condition, 我可以做单一的条件,

df['v1_a']=np.where(df['v1']=='a',1,0)

it gives a new column named 'v1_a' with 1/0 它给出了一个名为'v1_a'的新列'v1_a'包含1/0

However, if I want to create a new column based on multiple conditions, this does not work: 但是,如果我想基于多个条件创建新列,则不起作用:

df['v2_flag']=np.where(df['v2']=='f' or df['v2']=='h',1,0)

How can I accomplish this? 我怎么能做到这一点?

If you use multiple condition you'll get the following ValueError because np.where() doesn't accept multiple condition : 如果使用多个条件,则会得到以下ValueError因为np.where()不接受多个条件:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So in your I suggest to use np.logical_or . 所以在你的建议中我建议使用np.logical_or

df['v2_flag']=np.where(np.logical_or(df['v2']=='f',df['v2']=='h'),1,0)

See the following example too: 请参阅以下示例:

>>> a=np.array([2,2,2,5,7,8,1,4,2,3,4,5,6])
>>> np.where(np.logical_or(a==5,a==2),a,0)
array([2, 2, 2, 5, 0, 0, 0, 0, 2, 0, 0, 5, 0])

In python and and or can only give a single result and can't be overridden to have other purposes by modules like the giant row by row comparison you're trying to do. 在python andor只能提供单个结果,并且不能被重写以具有其他目的,例如你想要做的巨大的逐行比较。

You need to use the symbolic & (and) and | 你需要使用符号& (和)和| (or), which are normally used for bit-wise comparisons. (或),通常用于逐位比较。 These have been re-purposed by pandas to be a row by row comparison, which actually makes sense as being analogous to bit-wise comparisons. 这些已经被大熊猫重新定位为逐行比较,这实际上是有道理的,因为它类似于逐位比较。 That is more of a happy coincidence though, as these were mainly used because these can be overridden by the modules. 然而,这更令人高兴,因为这些主要是因为这些可以被模块覆盖。

Because of the priority of these and equalities, you'll need parentheses around each term or else it would calculate the | 由于这些和平等的优先级,你需要在每个术语周围括号,否则它将计算| before the == which isn't what you want. ==之前,这不是你想要的。 You can use something like this: 你可以使用这样的东西:

df['v2_flag']=np.where((df['v2']=='f')|(df['v2']=='h'),1,0)

df['v2']=='f' or df['v2']=='h' raises the ValueError before it gets to np.where . df['v2']=='f' or df['v2']=='h' 它到达np.where 之前引发ValueError。 The or causes Python to evaluate df['v2']=='f' and df['v2']=='h' in a boolean context. or导致Python在布尔上下文中评估df['v2']=='f'df['v2']=='h' But Pandas Series , like NumPy arrays, refuse to be reduce to a single boolean value -- they raise a ValueError instead . 但是Pandas Series和NumPy数组一样,拒绝减少到一个布尔值 - 它们会引发一个ValueError

To fix your code, you could use 要修复您的代码,您可以使用

df['v2_flag'] = np.where( (df['v2']=='f') | (df['v2']=='h'), 1, 0)

The | | performs bitwise-or element-wise over the two boolean-valued Series. 在两个布尔值系列上执行按位或元素方式。

Other ways to define df['v2_flag'] include 定义df['v2_flag']其他方法包括

df['v2_flag'] = ((df['v2']=='f') | (df['v2']=='h')).astype(int)

or 要么

df['v2_flag'] = df['v2'].isin(['f', 'h']).astype(int)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据多个条件在 pandas df 中创建一个新列? - How to create a new column in a pandas df based on multiple conditions? Pandas 如何根据条件使用 DF2 中列的值在 DF1 中创建新列 - Pandas how to create new colum in DF1 with values of column in DF2 based on conditions 如何根据另一列中的多个条件创建新列 - How to create a new column based on multiple conditions in another column Python - 基于多个条件和空检查的 Pandas 新 DF 列 - Python - Pandas new DF column based on multiple conditions and null checks 如何在没有迭代的情况下在Pandas中基于2+条件创建新的df.column? - How to create a new df.column based on 2+ conditions in Pandas without iteration? 如何按多列分组并根据Python中的条件创建新列? - How to group by multiple columns and create a new column based on conditions in Python? 如何使用基于 2 列的多个条件在 pandas 中创建新列? - How to use multiple conditions based on 2 columns to create the new column in pandas? 根据多个组合条件创建新列 - Create new column based on multiple groupby conditions 根据多个 IF 条件创建具有新 ID 的列 - Create column with new IDs based on multiple IF Conditions 如何根据条件从另一行的字符串创建新的df列? - How to create new df column from strings of another rows with conditions?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM