[英]extracting purchase and sale dates from single transaction date column in R
I'm trying to split a transaction date column into 2 separate ones, 1 column for buy date, another for sell date.我正在尝试将交易日期列拆分为 2 个单独的列,1 列用于购买日期,另一列用于出售日期。 Likewise, I'd like to split a single transaction price column into sell price, and buy price.
同样,我想将单个交易价格列拆分为卖出价和买入价。 There is this post which is similar, but here I'd like to track every transaction date, instead of just imputing the earliest date as the buy and the latest date as the sell.
有一个类似的帖子,但在这里我想跟踪每个交易日期,而不是仅仅将最早的日期作为买入和最晚的日期作为卖出。 So for example below is the current dataframe:
例如下面是当前的 dataframe:
property = c('A','A','A','A','B','B','B')
transaction_dates = c("2011-03-09", "2013-06-06", "2015-08-28", "2016-07-18", "2016-12-13", "2018-10-29", "2019-11-30")
prices = c(750000, 830000, 820000,800000,825000,900000,600000)
proptx = data.frame(property,transaction_dates,prices)
property transaction_dates prices
1 A 2011-03-09 750000
2 A 2013-06-06 830000
3 A 2015-08-28 820000
4 A 2016-07-18 800000
5 B 2016-12-13 825000
6 B 2018-10-29 900000
7 B 2019-11-30 600000
I am trying to add columns (or rather perhaps generate a new data frame) that breaks the transaction date column and price column into separate "buy" and "sell" columns, like so我正在尝试添加将交易日期列和价格列分成单独的“买入”和“卖出”列的列(或者更确切地说可能生成一个新的数据框),就像这样
property buy_date buy_price sell_date sell_price
1 A 2011-03-09 750000 2013-06-06 830000
2 A 2013-06-06 830000 2015-08-28 820000
3 A 2015-08-28 820000 2016-07-18 800000
4 A 2016-07-18 800000 NA NA
5 B 2016-12-13 825000 2018-10-29 900000
5 B 2018-10-29 900000 2019-11-30 600000
6 B 2019-11-30 600000 NA NA
Ultimately what I would like to do is track the length of time that elapses between buy and sell dates, and then calculate the return to the seller.最终我想做的是跟踪买卖日期之间经过的时间长度,然后计算给卖家的回报。 Rows 4 and 6 would represent that the property is not being/has not been sold.
第 4 行和第 6 行表示该物业没有/尚未出售。 The actual data frame has hundreds of thousands of distinct properties, and I was hoping to do this sort of operation on each property.
实际的数据框有数十万个不同的属性,我希望对每个属性都进行这种操作。
Can this be done relatively easily?这可以相对容易地完成吗?
Using data.table
:使用
data.table
:
library(data.table)
dt <- as.data.table(proptx)
setnames(dt, old="transaction_dates", new="buy_date")
dt[, sell_date:=shift(buy_date, 1, type='lead'), by=property]
dt[, sell_price:=shift(prices, 1, type='lead'), by=property]
dt
property buy_date prices sell_date sell_price
1: A 2011-03-09 750000 2013-06-06 830000
2: A 2013-06-06 830000 2015-08-28 820000
3: A 2015-08-28 820000 2016-07-18 800000
4: A 2016-07-18 800000 <NA> NA
5: B 2016-12-13 825000 2018-10-29 900000
6: B 2018-10-29 900000 2019-11-30 600000
7: B 2019-11-30 600000 <NA> NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.