简体   繁体   English

大熊猫 - 基于其他两列移位值的条件计算

[英]Pandas - Conditional Calculation Based on Shift Values from Two Other Columns

I'm sure this question is easy, but it has been stumping me for too long, so would REALLY appreciate some direction 我敢肯定这个问题很容易,但是我已经把我困扰了太长时间,所以我真的很欣赏一些方向

I'm looking to add a column to a dataframe based on the results of two other columns 我希望根据另外两列的结果向数据框添加一列

I want to identify if the stock is equal to the stock in the prior row and the date is equal to the date in the prior row. 我想确定股票是否等于前一行中的股票,日期是否等于前一行中的日期。

I am looking to get the running count I tried something along the lines of the following 我希望获得运行计数我尝试了以下几行

df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() ,  1, 0)

and

df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() &    df['trade_date']==df['trade_date'].shift(),1,0)

Sample input 样本输入

Stock, Date, Time, Price 
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92

And output: 并输出:

Stock, Date,Time, Price, DayCount 
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2

I got errors like 我得到了错误

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

And then applying a cumulative count . 然后应用cumulative count

First, and this is most important to me, how do you write the initial statement so that you can do the compare over multiple columns 首先,这对我来说最重要的是,如何编写初始语句,以便可以对多列进行比较

Second, how would you add in the cumulative count ? 其次,您如何添加cumulative count

Thank you so much for helping 非常感谢你的帮助

Expanding on the original post, here is another question.. Assume now that the data set is slightly different 扩展原始帖子,这是另一个问题。现在假设数据集略有不同

Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer

And we are looking to see how many times in a row stocks traded on the bid or offer, so the output would be: 我们希望看到连续多少次股票在买入或报价上交易,因此产出将是:

Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1 
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2

Groupings are effectively Stock and Date, time is just used to determine sequence.. any help much appreciated on this expansion 分组实际上是股票和日期,时间仅用于确定序列..任何有助于此扩展的帮助

UPDATE3: "And we are looking to see how many times in a row stocks traded on the bid or offer" 更新3: “我们希望看到连续多少次股票在买卖或卖出交易”

In [112]: g = df.groupby(['Stock','Date'])

In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1

In [114]: df
Out[114]:
  Stock       Date      Time   Price BidOffer  Count
0   IBM 2014-09-01  12:30:01   50.50      bid      1
1   IBM 2014-09-01  12:30:02   50.70    offer      1
2   IBM 2014-09-01  12:30:03   50.90      bid      1
3   IBM 2014-09-02  09:57:02   52.70      bid      1
4   IBM 2014-09-02  09:57:03   52.90      bid      2
5  AAPL 2014-11-02  09:57:02  520.31    offer      1
6  AAPL 2014-11-02  09:57:03  520.92    offer      2

UPDATE2: UPDATE2:

In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1

In [516]: df
Out[516]:
  Stock       Date      Time   Price BidOffer  DayCount
0   IBM 2014-09-01  12:30:01   50.50      bid         1
1   IBM 2014-09-01  12:30:02   50.70    offer         1
2   IBM 2014-09-01  12:30:03   50.90      bid         2
3   IBM 2014-09-02  09:57:02   52.70      bid         1
4   IBM 2014-09-02  09:57:03   52.90      bid         2
5  AAPL 2014-11-02  09:57:02  520.31    offer         1
6  AAPL 2014-11-02  09:57:03  520.92    offer         2

UPDATE: 更新:

In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1

In [490]: df
Out[490]:
  Stock            Datetime   Price  DayCount
0   IBM 2014-09-01 12:30:01   50.50         1
1   IBM 2014-09-01 12:30:02   50.70         2
2   IBM 2014-09-01 12:30:03   50.90         3
3   IBM 2014-09-02 09:57:02   52.70         1
4   IBM 2014-09-02 09:57:03   52.90         2
5  AAPL 2014-11-02 09:57:02  520.31         1
6  AAPL 2014-11-02 09:57:03  520.92         2

Answer for the original question: 回答原始问题:

df['DayCount']=np.where(
                  (df['ticker']==df['ticker'].shift())
                  &
                  (df['trade_date']==df['trade_date'].shift()),
                  1,
                  0
)

The only thing that was missing in your second solution is parenthesis: np.where( (...) & (...), 1, 0) 你的第二个解决方案中唯一缺少的是括号: np.where( (...) & (...), 1, 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于其他列中的值的 Pandas 条件计算 - Pandas conditional calculation based on values in other column 根据来自其他两列的条件文本值在 Pandas 中创建一个新列 - Create a new column in pandas based on conditional text values from two other columns Pandas - 如何根据其他列值移动列 - Pandas - How to shift a column based on other columns values 根据Pandas中其他两个列的相等性从列中提取值 - Extract values from a column based on the equality of two other columns in Pandas Pandas:如何根据其他列值的条件对列求和? - Pandas: How to sum columns based on conditional of other column values? 基于条件选择的新列,来自Pandas DataFrame中其他2列的值 - New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame 有没有一种有效的方法来计算 Pandas 中的列值,使用基于其他列的条件值的前行的值? - Is there an efficient way to compute column values in Pandas using values from previous rows based on conditional values from other columns? 从具有缺失值的 pandas 数据框的两列计算 BMI - BMI calculation from two columns of a pandas data frame with missing values 熊猫-基于两列中的值进行分组 - pandas - group based on values from two columns 使用熊猫基于其他两列中的值替换列中的值 - Replace values in column based on values in two other columns using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM