繁体   English   中英

大熊猫 - 基于其他两列移位值的条件计算

[英]Pandas - Conditional Calculation Based on Shift Values from Two Other Columns

我敢肯定这个问题很容易,但是我已经把我困扰了太长时间,所以我真的很欣赏一些方向

我希望根据另外两列的结果向数据框添加一列

我想确定股票是否等于前一行中的股票,日期是否等于前一行中的日期。

我希望获得运行计数我尝试了以下几行

df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() ,  1, 0)

df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() &    df['trade_date']==df['trade_date'].shift(),1,0)

样本输入

Stock, Date, Time, Price 
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92

并输出:

Stock, Date,Time, Price, DayCount 
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2

我得到了错误

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

然后应用cumulative count

首先,这对我来说最重要的是,如何编写初始语句,以便可以对多列进行比较

其次,您如何添加cumulative count

非常感谢你的帮助

扩展原始帖子,这是另一个问题。现在假设数据集略有不同

Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer

我们希望看到连续多少次股票在买入或报价上交易,因此产出将是:

Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1 
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2

分组实际上是股票和日期,时间仅用于确定序列..任何有助于此扩展的帮助

更新3: “我们希望看到连续多少次股票在买卖或卖出交易”

In [112]: g = df.groupby(['Stock','Date'])

In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1

In [114]: df
Out[114]:
  Stock       Date      Time   Price BidOffer  Count
0   IBM 2014-09-01  12:30:01   50.50      bid      1
1   IBM 2014-09-01  12:30:02   50.70    offer      1
2   IBM 2014-09-01  12:30:03   50.90      bid      1
3   IBM 2014-09-02  09:57:02   52.70      bid      1
4   IBM 2014-09-02  09:57:03   52.90      bid      2
5  AAPL 2014-11-02  09:57:02  520.31    offer      1
6  AAPL 2014-11-02  09:57:03  520.92    offer      2

UPDATE2:

In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1

In [516]: df
Out[516]:
  Stock       Date      Time   Price BidOffer  DayCount
0   IBM 2014-09-01  12:30:01   50.50      bid         1
1   IBM 2014-09-01  12:30:02   50.70    offer         1
2   IBM 2014-09-01  12:30:03   50.90      bid         2
3   IBM 2014-09-02  09:57:02   52.70      bid         1
4   IBM 2014-09-02  09:57:03   52.90      bid         2
5  AAPL 2014-11-02  09:57:02  520.31    offer         1
6  AAPL 2014-11-02  09:57:03  520.92    offer         2

更新:

In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1

In [490]: df
Out[490]:
  Stock            Datetime   Price  DayCount
0   IBM 2014-09-01 12:30:01   50.50         1
1   IBM 2014-09-01 12:30:02   50.70         2
2   IBM 2014-09-01 12:30:03   50.90         3
3   IBM 2014-09-02 09:57:02   52.70         1
4   IBM 2014-09-02 09:57:03   52.90         2
5  AAPL 2014-11-02 09:57:02  520.31         1
6  AAPL 2014-11-02 09:57:03  520.92         2

回答原始问题:

df['DayCount']=np.where(
                  (df['ticker']==df['ticker'].shift())
                  &
                  (df['trade_date']==df['trade_date'].shift()),
                  1,
                  0
)

你的第二个解决方案中唯一缺少的是括号: np.where( (...) & (...), 1, 0)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM