[英]Pandas - Conditional Calculation Based on Shift Values from Two Other Columns
I'm sure this question is easy, but it has been stumping me for too long, so would REALLY appreciate some direction 我敢肯定这个问题很容易,但是我已经把我困扰了太长时间,所以我真的很欣赏一些方向
I'm looking to add a column to a dataframe based on the results of two other columns 我希望根据另外两列的结果向数据框添加一列
I want to identify if the stock is equal to the stock in the prior row and the date is equal to the date in the prior row. 我想确定股票是否等于前一行中的股票,日期是否等于前一行中的日期。
I am looking to get the running count I tried something along the lines of the following 我希望获得运行计数我尝试了以下几行
df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() , 1, 0)
and 和
df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() & df['trade_date']==df['trade_date'].shift(),1,0)
Sample input 样本输入
Stock, Date, Time, Price
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92
And output: 并输出:
Stock, Date,Time, Price, DayCount
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2
I got errors like 我得到了错误
TypeError: unsupported operand type(s) for &: 'str' and 'bool'
And then applying a cumulative count
. 然后应用cumulative count
。
First, and this is most important to me, how do you write the initial statement so that you can do the compare over multiple columns 首先,这对我来说最重要的是,如何编写初始语句,以便可以对多列进行比较
Second, how would you add in the cumulative count
? 其次,您如何添加cumulative count
?
Thank you so much for helping 非常感谢你的帮助
Expanding on the original post, here is another question.. Assume now that the data set is slightly different 扩展原始帖子,这是另一个问题。现在假设数据集略有不同
Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer
And we are looking to see how many times in a row stocks traded on the bid or offer, so the output would be: 我们希望看到连续多少次股票在买入或报价上交易,因此产出将是:
Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2
Groupings are effectively Stock and Date, time is just used to determine sequence.. any help much appreciated on this expansion 分组实际上是股票和日期,时间仅用于确定序列..任何有助于此扩展的帮助
UPDATE3: "And we are looking to see how many times in a row stocks traded on the bid or offer" 更新3: “我们希望看到连续多少次股票在买卖或卖出交易”
In [112]: g = df.groupby(['Stock','Date'])
In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1
In [114]: df
Out[114]:
Stock Date Time Price BidOffer Count
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 1
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2
UPDATE2: UPDATE2:
In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1
In [516]: df
Out[516]:
Stock Date Time Price BidOffer DayCount
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 2
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2
UPDATE: 更新:
In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1
In [490]: df
Out[490]:
Stock Datetime Price DayCount
0 IBM 2014-09-01 12:30:01 50.50 1
1 IBM 2014-09-01 12:30:02 50.70 2
2 IBM 2014-09-01 12:30:03 50.90 3
3 IBM 2014-09-02 09:57:02 52.70 1
4 IBM 2014-09-02 09:57:03 52.90 2
5 AAPL 2014-11-02 09:57:02 520.31 1
6 AAPL 2014-11-02 09:57:03 520.92 2
Answer for the original question: 回答原始问题:
df['DayCount']=np.where(
(df['ticker']==df['ticker'].shift())
&
(df['trade_date']==df['trade_date'].shift()),
1,
0
)
The only thing that was missing in your second solution is parenthesis: np.where( (...) & (...), 1, 0)
你的第二个解决方案中唯一缺少的是括号: np.where( (...) & (...), 1, 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.