[英]Create new column based on condition of values in two other data frame columns
I'm new to python. 我是python的新手。 I have a feeling there is a quick fix, but nothing has seemed quick to me yet. 我感觉有一个快速解决的方法,但是对我而言似乎还没有快速解决方法。
I have a 150,000+ row dataframe, data
. 我有一个超过150,000行数据框data
。 Within it are two series: gridcode
and CH4_Flux
among others. 其中有两个系列: gridcode
和CH4_Flux
。 I want to create a new categorical column, called category
that assigns a categorical identifier to each row based on the four conditions shown here: 我想创建一个新的类别列,称为category
,它根据此处显示的四个条件为每行分配一个类别标识符:
cat1 = data[(data.gridcode <= threshAV) & (data.CH4_Flux >= threshAM)]
cat2 = data[(data.gridcode >= threshAV) & (data.CH4_Flux >= threshAM)]
cat3 = data[(data.gridcode <= threshAV) & (data.CH4_Flux <= threshAM)]
cat4 = data[(data.gridcode >= threshAV) & (data.CH4_Flux <= threshAM)]
where threshAV
is a predesignated threshold for gridcode
, and threshAM
is a predesignated threshold for CH4_Flux
. 其中threshAV
为预先指定的阈值gridcode
,和threshAM
为预先指定的阈值CH4_Flux
。 Essentially either both exceed the threshold, neither exceed, or one or the other exceeds. 本质上,两者都超过了阈值,都没有超过,或者一个或另一个都超过了。 Preferably the categorical labels would simply be the integers 1,2,3, and 4 respectively following the logic of cat1
, cat2
, cat3
,and cat4
above. 优选地,类别标签将分别简单地是分别遵循上述cat1
, cat2
, cat3
和cat4
的逻辑的整数1,2、3和4。
I have tried for
loops and if
and where
statements but have struck out. 我尝试for
循环以及if
和where
语句,但是都删除了。
When experimenting with for
loops, I commonly get the error: 在尝试for
循环时,我通常会收到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thank you in advance for any advice or direction! 预先感谢您的任何建议或指示!
Use numpy.select
for new columns by multiple boolean masks: 将numpy.select
用于具有多个布尔掩码的新列:
#removed filtering by data[]
m1 = (data.gridcode <= threshAV) & (data.CH4_Flux >= threshAM)
m2 = (data.gridcode >= threshAV) & (data.CH4_Flux >= threshAM)
m3 = (data.gridcode <= threshAV) & (data.CH4_Flux <= threshAM)
m4 = (data.gridcode >= threshAV) & (data.CH4_Flux <= threshAM)
data['category'] = np.select([m1, m2, m3, m4], [1,2,3,4])
Or: 要么:
data['category'] = np.select([m1, m2, m3, m4], ['cat1','cat2','cat3','cat4'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.