简体   繁体   English

基于条件选择的新列,来自Pandas DataFrame中其他2列的值

[英]New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame

I've got a DataFrame which contains stock values. 我有一个包含股票价值的DataFrame

It looks like this: 它看起来像这样:

>>>Data Open High Low Close Volume Adj Close Date                                                       
2013-07-08  76.91  77.81  76.85  77.04  5106200  77.04

When I try to make a conditional new column with the following if statement: 当我尝试使用以下if语句创建条件新列时:

Data['Test'] =Data['Close'] if Data['Close'] > Data['Open'] else Data['Open']

I get the following error: 我收到以下错误:

Traceback (most recent call last):
  File "<pyshell#116>", line 1, in <module>
    Data[1]['Test'] =Data[1]['Close'] if Data[1]['Close'] > Data[1]['Open'] else Data[1]['Open']
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I then used a.all() : 然后我使用了a.all()

Data[1]['Test'] =Data[1]['Close'] if all(Data[1]['Close'] > Data[1]['Open']) else Data[1]['Open']

The result was that the entire ['Open'] Column was selected. 结果是选择了整个['Open']列。 I didn't get the condition that I wanted, which is to select every time the biggest value between the ['Open'] and ['Close'] columns. 我没有得到我想要的条件,即每次选择['Open']['Close']列之间的最大值。

Any help is appreciated. 任何帮助表示赞赏。

Thanks. 谢谢。

From a DataFrame like: 来自DataFrame,如:

>>> df
         Date   Open   High    Low  Close   Volume  Adj Close
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04

The simplest thing I can think of would be: 我能想到的最简单的事情是:

>>> df["Test"] = df[["Open", "Close"]].max(axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

df.ix[:,["Open", "Close"]].max(axis=1) might be a little faster, but I don't think it's as nice to look at. df.ix[:,["Open", "Close"]].max(axis=1)可能会快一点,但我认为看起来不太好看。

Alternatively, you could use .apply on the rows: 或者,您可以.apply上使用.apply

>>> df["Test"] = df.apply(lambda row: max(row["Open"], row["Close"]), axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

Or fall back to numpy: 或者回到numpy:

>>> df["Test"] = np.maximum(df["Open"], df["Close"])
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

The basic problem is that if/else doesn't play nicely with arrays, because if (something) always coerces the something into a single bool . 基本问题是if/else不能很好地与数组一起使用,因为if (something)总是将something强制转换为单个bool It's not equivalent to "for every element in the array something, if the condition holds" or anything like that. 它不等于“对于数组中的每个元素,如果条件成立”或类似的东西。

In [7]: df = DataFrame(randn(10,2),columns=list('AB'))

In [8]: df
Out[8]: 
          A         B
0 -0.954317 -0.485977
1  0.364845 -0.193453
2  0.020029 -1.839100
3  0.778569  0.706864
4  0.033878  0.437513
5  0.362016  0.171303
6  2.880953  0.856434
7 -0.109541  0.624493
8  1.015952  0.395829
9 -0.337494  1.843267

This is a where conditional, saying give me the value for A if A > B, else give me B 这是一个有条件的地方,说如果A> B给我A的值,否则给我B

# this syntax is EQUIVALENT to
# df.loc[df['A']>df['B'],'A'] = df['B']

In [9]: df['A'].where(df['A']>df['B'],df['B'])
Out[9]: 
0   -0.485977
1    0.364845
2    0.020029
3    0.778569
4    0.437513
5    0.362016
6    2.880953
7    0.624493
8    1.015952
9    1.843267
dtype: float64

In this case max is equivalent 在这种情况下, max是等价的

In [10]: df.max(1)
Out[10]: 
0   -0.485977
1    0.364845
2    0.020029
3    0.778569
4    0.437513
5    0.362016
6    2.880953
7    0.624493
8    1.015952
9    1.843267
dtype: float64

The issue is that you're asking python to evaluate a condition ( Data['Close'] > Data['Open'] ) which contains more than one boolean value. 问题是你要求python评估一个包含多个布尔值的条件( Data['Close'] > Data['Open'] )。 You do not want to use any or all since either, since that will set Data['Test'] to either Data['Open'] or Data['Close'] . 您不希望使用anyall ,因为这会将Data['Test']设置为Data['Open']Data['Close']

There might be a cleaner method, but one approach is to use a mask (boolean array): 可能有一个更干净的方法,但一种方法是使用掩码(布尔数组):

mask = Data['Close'] > Data['Open']
Data['Test'] = pandas.concat([Data['Close'][mask].dropna(), Data['Open'][~mask].dropna()]).reindex_like(Data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in other columns 根据来自其他两列的条件文本值在 Pandas 中创建一个新列 - Create a new column in pandas based on conditional text values from two other columns 如何在新列中填充值 - How to populate values inside a new column based values from other columns in a dataframe in Pandas 根据列值将值从一个数据帧映射到其他数据帧中的新列 - Pandas - Map values from one dataframe to new columns in other based on column values - Pandas Pandas dataframe:根据其他列的数据创建新列 - Pandas dataframe: Creating a new column based on data from other columns 如何根据 Pandas DataFrame 中其他列的值创建新列 - How to create a new column based on values from other columns in a Pandas DataFrame 基于python pandas中其他列的值创建新列 - Creating a new column based on values from other columns in python pandas Python Pandas 基于其他列值的新列 - Python Pandas New Column based on values from other columns Pandas有条件的新列基于在其他数据框列中找到的期间 - Pandas Conditional new column based on period found in other dataframe column pandas DataFrame中的新列取决于其他列的值 - New column in pandas DataFrame conditional on value of other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM