简体   繁体   English

根据其他两个数据框列中的值条件创建新列

[英]Create new column based on condition of values in two other data frame columns

I'm new to python. 我是python的新手。 I have a feeling there is a quick fix, but nothing has seemed quick to me yet. 我感觉有一个快速解决的方法,但是对我而言似乎还没有快速解决方法。

I have a 150,000+ row dataframe, data . 我有一个超过150,000行数据框data Within it are two series: gridcode and CH4_Flux among others. 其中有两个系列: gridcodeCH4_Flux I want to create a new categorical column, called category that assigns a categorical identifier to each row based on the four conditions shown here: 我想创建一个新的类别列,称为category ,它根据此处显示的四个条件为每行分配一个类别标识符:

cat1 = data[(data.gridcode <= threshAV) & (data.CH4_Flux >= threshAM)]
cat2 = data[(data.gridcode >= threshAV) & (data.CH4_Flux >= threshAM)]
cat3 = data[(data.gridcode <= threshAV) & (data.CH4_Flux <= threshAM)]
cat4 = data[(data.gridcode >= threshAV) & (data.CH4_Flux <= threshAM)]

where threshAV is a predesignated threshold for gridcode , and threshAM is a predesignated threshold for CH4_Flux . 其中threshAV为预先指定的阈值gridcode ,和threshAM为预先指定的阈值CH4_Flux Essentially either both exceed the threshold, neither exceed, or one or the other exceeds. 本质上,两者都超过了阈值,都没有超过,或者一个或另一个都超过了。 Preferably the categorical labels would simply be the integers 1,2,3, and 4 respectively following the logic of cat1 , cat2 , cat3 ,and cat4 above. 优选地,类别标签将分别简单地是分别遵循上述cat1cat2cat3cat4的逻辑的整数1,2、3和4。

I have tried for loops and if and where statements but have struck out. 我尝试for循环以及ifwhere语句,但是都删除了。

When experimenting with for loops, I commonly get the error: 在尝试for循环时,我通常会收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thank you in advance for any advice or direction! 预先感谢您的任何建议或指示!

Use numpy.select for new columns by multiple boolean masks: numpy.select用于具有多个布尔掩码的新列:

#removed filtering by data[]
m1 = (data.gridcode <= threshAV) & (data.CH4_Flux >= threshAM)
m2 = (data.gridcode >= threshAV) & (data.CH4_Flux >= threshAM)
m3 = (data.gridcode <= threshAV) & (data.CH4_Flux <= threshAM)
m4 = (data.gridcode >= threshAV) & (data.CH4_Flux <= threshAM)

data['category'] = np.select([m1, m2, m3, m4], [1,2,3,4])

Or: 要么:

data['category'] = np.select([m1, m2, m3, m4], ['cat1','cat2','cat3','cat4'])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果满足基于同一数据帧中其他2列的行值的条件,则在数据帧的列行中填充值 - Filling values in rows of column in a data frame, if condition based on 2 other columns row values in the same data frame is met 尝试使用 Pandas 数据框中其他两列的 groupby 基于另一列创建新的滚动平均列时出错 - Error when trying to create new rolling average column based on another column using groupby of two other columns in pandas data frame 根据其他列中的值创建新列 - Create new column based on values in other columns 如何根据其他列的某些条件在数据框列中填充新值 - How to fill new values in a data frame column based on some condition from other column 比较两个不同数据框的两列,并使用If条件创建新列 - Comparing Two columns of Two different data frame and create new column with If condition 如何在我的主数据框中创建一个新列,根据它们共有的两列填充较小数据集中的值? - how do I create a new column in my main data frame filling in the values from a smaller dataset based on two columns they have in common? 如何在Pandas Data Frame中创建条件列,其中列值基于其他列 - How to create conditional columns in Pandas Data Frame in which column values are based on other columns 需要在某些情况下基于数据帧中的退出列创建两列 - Need to create two columns based on exiting column in data frame with some condition 根据从python中的其他两个字符串列应用的条件创建一个新列 - Create a new column based on condition applied from two other string columns in python 比较两个熊猫数据框列的元素,并基于第三列创建一个新列 - Compare elements of two pandas data frame columns and create a new column based on a third column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM