简体   繁体   English

从 R 中的单个交易日期列中提取购买和销售日期

[英]extracting purchase and sale dates from single transaction date column in R

I'm trying to split a transaction date column into 2 separate ones, 1 column for buy date, another for sell date.我正在尝试将交易日期列拆分为 2 个单独的列,1 列用于购买日期,另一列用于出售日期。 Likewise, I'd like to split a single transaction price column into sell price, and buy price.同样,我想将单个交易价格列拆分为卖出价和买入价。 There is this post which is similar, but here I'd like to track every transaction date, instead of just imputing the earliest date as the buy and the latest date as the sell.有一个类似的帖子,但在这里我想跟踪每个交易日期,而不是仅仅将最早的日期作为买入和最晚的日期作为卖出。 So for example below is the current dataframe:例如下面是当前的 dataframe:

property = c('A','A','A','A','B','B','B')
transaction_dates = c("2011-03-09", "2013-06-06", "2015-08-28", "2016-07-18", "2016-12-13", "2018-10-29", "2019-11-30")
prices = c(750000, 830000, 820000,800000,825000,900000,600000) 

proptx = data.frame(property,transaction_dates,prices)

  property transaction_dates  prices
1        A        2011-03-09  750000
2        A        2013-06-06  830000
3        A        2015-08-28  820000
4        A        2016-07-18  800000
5        B        2016-12-13  825000
6        B        2018-10-29  900000
7        B        2019-11-30  600000

I am trying to add columns (or rather perhaps generate a new data frame) that breaks the transaction date column and price column into separate "buy" and "sell" columns, like so我正在尝试添加将交易日期列和价格列分成单独的“买入”和“卖出”列的列(或者更确切地说可能生成一个新的数据框),就像这样

  property    buy_date    buy_price  sell_date   sell_price
1        A    2011-03-09  750000     2013-06-06  830000
2        A    2013-06-06  830000     2015-08-28  820000
3        A    2015-08-28  820000     2016-07-18  800000
4        A    2016-07-18  800000     NA          NA
5        B    2016-12-13  825000     2018-10-29  900000
5        B    2018-10-29  900000     2019-11-30  600000
6        B    2019-11-30  600000     NA          NA

Ultimately what I would like to do is track the length of time that elapses between buy and sell dates, and then calculate the return to the seller.最终我想做的是跟踪买卖日期之间经过的时间长度,然后计算给卖家的回报。 Rows 4 and 6 would represent that the property is not being/has not been sold.第 4 行和第 6 行表示该物业没有/尚未出售。 The actual data frame has hundreds of thousands of distinct properties, and I was hoping to do this sort of operation on each property.实际的数据框有数十万个不同的属性,我希望对每个属性都进行这种操作。

Can this be done relatively easily?这可以相对容易地完成吗?

Using data.table :使用data.table

library(data.table)
dt <- as.data.table(proptx)
setnames(dt, old="transaction_dates", new="buy_date")
dt[, sell_date:=shift(buy_date, 1, type='lead'), by=property]
dt[, sell_price:=shift(prices, 1, type='lead'), by=property]
dt

   property   buy_date prices  sell_date sell_price
1:        A 2011-03-09 750000 2013-06-06     830000
2:        A 2013-06-06 830000 2015-08-28     820000
3:        A 2015-08-28 820000 2016-07-18     800000
4:        A 2016-07-18 800000       <NA>         NA
5:        B 2016-12-13 825000 2018-10-29     900000
6:        B 2018-10-29 900000 2019-11-30     600000
7:        B 2019-11-30 600000       <NA>         NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM