简体   繁体   English

如何根据r中的两列创建序列数值列?

[英]How to create a sequence numerical column based on two columns in r?

My dataframe "fsp" as 1702551 obs and 3 variables.我的 dataframe “fsp” 作为 1702551 obs 和 3 个变量。 It look like this:它看起来像这样:

tibble [1,702,551 x 3] 
 $ date       : Date[1:1702551], format: "2011-04-12" "2011-04-12" "2011-04-12" ...
 $ wavelength : num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
 $ ID         : chr [1:1702551] "c01" "c01" "c01" "c01" ...

Quick explanation of the data: Per each "date" and "ID" I had a spectral data (not shown) throughout the wavelength interval (350 to 2300nm).数据的快速解释:对于每个“日期”和“ID”,我在整个波长间隔(350 到 2300nm)都有一个光谱数据(未显示)。 I want to create a new column "target_ID" with a sequence of repeating numers that increases to the next consecutive number each time date or ID changes.我想创建一个新列“target_ID”,其中包含一系列重复数字,每次日期或 ID 更改时,这些数字都会增加到下一个连续数字。 For example for the first ID, "c01" and date "2011-04-12" I will have a column with the number 1 from the wavelength 350 to 2300. The next ID will have the number 2 and so on (along the dataframe "date" changes as well)例如,对于第一个 ID,“c01”和日期“2011-04-12”,我将有一列编号为 1,从波长 350 到 2300。下一个 ID 的编号为 2,依此类推(沿 dataframe “日期”也会改变)

Example of what I want to achieve (look "target_ID"):我想要实现的示例(查看“target_ID”):

|date      |wavelength|ID  |target_ID|
|:---------|:---------|:---|:--------|   
|2011-04-12|350       |c01 |1        |
|2011-04-12|351       |c01 |1        |
|2011-04-12|352       |c01 |1        |
|2011-04-12|353       |c01 |1        |
|...…………………|...……………….|....|...…………….|        
|2011-04-12|350       |c03 |2        |
|2011-04-12|351       |c03 |2        |
|...……………..|...……………….|....|...………………|
|2011-04-13|350       |c01 |3        |
|2011-04-13|351       |c01 |3       |

This is the code that I already tried but without success:这是我已经尝试过但没有成功的代码:

fsp<-fsp %>%
group_by(date, ID) %>%
mutate(target_ID, count=n())

Any help will be much appreciatted.任何帮助将不胜感激。

Thank you in advance.先感谢您。

This is a perfect use case for the rleid function from the data.table package:这是来自data.table package 的rleid function 的完美用例:

# example data
xx <- rep(Sys.Date(), 5)
xx <- c(xx, xx + lubridate::days(1))
id <- rep(c(1:4), c(2,3,3,2))
dat <- data.frame(date = xx, id = id)

#          date id
# 1  2021-03-29  1
# 2  2021-03-29  1
# 3  2021-03-29  2
# 4  2021-03-29  2
# 5  2021-03-29  2
# 6  2021-03-30  3
# 7  2021-03-30  3
# 8  2021-03-30  3
# 9  2021-03-30  4
# 10 2021-03-30  4

library(data.table)
dat_dt <- as.data.table(dat)
dat_dt[,target_id := rleid(date, id)]

 #          date id target_id
 # 1: 2021-03-29  1         1
 # 2: 2021-03-29  1         1
 # 3: 2021-03-29  2         2
 # 4: 2021-03-29  2         2
 # 5: 2021-03-29  2         2
 # 6: 2021-03-30  3         3
 # 7: 2021-03-30  3         3
 # 8: 2021-03-30  3         3
 # 9: 2021-03-30  4         4
 #10: 2021-03-30  4         4

And here's how you could use %>% and mutate to solve it:以下是您如何使用%>%mutate来解决它:

library(tidyverse)
dat %>%
    mutate(target_id = data.table::rleid(date, id))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM