[英]R: create single new column based upon matching string in multiple other columns
[英]R: Create multiple new columns based upon other columns
假設我有一個看起來像這樣的數據框架
dd <- read.table(header = TRUE, text = "ID week1_t week1_a week2_t week2_a
1 12 22 17 4
1 15 32 18 5
1 24 12 29 6
2 45 11 19 8
2 23 33 20 10")
是否有一種直接的方法來創建week1_d列,week2_d列,等等每周,這是基於week1_t和week1_a之間的差異? 或者我是否必須手動構建“差異”列?
預期輸出如下:
dd <- read.table(header = TRUE, text = "ID week1_t week1_a week2_t week2_a week1_d week2_d
1 12 22 17 4 10 -13
1 15 32 18 5 17 -13
1 24 12 29 6 -12 -23
2 45 11 19 8 -34 -11
2 23 33 20 10 10 -10 ")
實際上,有大約30周,所以我試圖避免手動這樣做。 我正在考慮for循環每周的運行,並且grepping匹配week +(循環索引)的列。 有沒有更好的方法呢?
從“整潔的數據”角度來看,您的問題是您在列名中編碼(多個!)數據:周數和字母代表的數字。 我會轉換為長格式,其中week是一列,定義d = a - t
,並且(如果需要)轉換回寬格式。 但是我可能會把它保留為長格式,因為如果你想做任何其他操作,它們可能更容易在長數據上實現(更多的操作,建模,繪圖......)。
library(tidyr)
library(dplyr)
long = dd %>%
mutate(real_id = 1:n()) %>%
gather(key = key, value = value, starts_with("week")) %>%
separate(key, into = c("week", "letter")) %>%
spread(key = letter, value = value) %>%
mutate(d = a - t)
head(long)
# ID real_id week a t d
# 1 1 1 week1 22 12 10
# 2 1 1 week2 4 17 -13
# 3 1 2 week1 32 15 17
# 4 1 2 week2 5 18 -13
# 5 1 3 week1 12 24 -12
# 6 1 3 week2 6 29 -23
wide = gather(long, key = letter, value = value, a, t, d) %>%
mutate(key = paste(week, letter, sep = "_")) %>%
select(-week, -letter) %>%
spread(key = key, value = value)
wide
# ID real_id week1_a week1_d week1_t week2_a week2_d week2_t
# 1 1 1 22 10 12 4 -13 17
# 2 1 2 32 17 15 5 -13 18
# 3 1 3 12 -12 24 6 -23 29
# 4 2 4 11 -34 45 8 -11 19
# 5 2 5 33 10 23 10 -10 20
我們split
帶有sub
的后綴移除到list
,將'week'列( dd[-1]
)除以數據集的names
,得到兩列之間的差異並分配list
元素以在'dd'中創建新列。
lst <- lapply(split.default(dd[-1],
sub("_.*", "", names(dd)[-1])), function(x) x[2]-x[1])
dd[paste0("week_", seq_along(lst), "d")] <- lapply(lst, unlist, use.names=FALSE)
dd
# ID week1_t week1_a week2_t week2_a week1_d week2_d
#1 1 12 22 17 4 10 -13
#2 1 15 32 18 5 17 -13
#3 1 24 12 29 6 -12 -23
#4 2 45 11 19 8 -34 -11
#5 2 23 33 20 10 10 -10
如果列是交替的,即'week1_t'后跟'week1_a',則'week2_t',然后是'week2_a',等等。
Un1 <- unique(sub("_.*", "", names(dd)[-1]))
i1 <- c(TRUE, FALSE)
dd[paste0(Un1, "_d")] <- dd[-1][!i1]- dd[-1][i1]
dd
# ID week1_t week1_a week2_t week2_a week1_d week2_d
#1 1 12 22 17 4 10 -13
#2 1 15 32 18 5 17 -13
#3 1 24 12 29 6 -12 -23
#4 2 45 11 19 8 -34 -11
#5 2 23 33 20 10 10 -10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.