[英]How to combine columns based on column name in R?
這里、 這里和這里都提出了類似的問題。 但是,我無法讓這些解決方案解決我的問題。
我正在嘗試根據它們的名稱組合列,然后為每對變量創建一個矩陣/數據框。 希望我的例子能解釋得更清楚。
例如,假設我們有一個如下所示的數據框:
# create some data
set.seed(100)
dfOG <- data.frame(
day = sample(c('1', '2'), 3, replace = T),
rain = sample(c('yes', 'no'), 3, replace = T),
val1 = runif(3)
)
我正在應用一個過程(這是我無法控制的),將分類變量拆分為虛擬變量(每個級別都有一個虛擬變量)。 最后,我得到一個矩陣,其中的列是每個成對變量。 output 看起來像這樣:
# create matrix of all pairs
name2 <- c('day.1', 'day.2',
'rain.yes', 'rain.no', 'val1')
nam2 <- expand.grid(name2, name2)
newName2 <- NULL
for(i in 1:length(nam2$Var1)){
newName2[i] <- paste0(nam2$Var2[i], ":", nam2$Var1[i])
}
set.seed(100)
newMat2 <- matrix(rexp(75, rate=.1), nrow = 3, ncol = length(newName2))
colnames(newMat2) <- newName2
> newMat2
day.1:day.1 day.1:day.2 day.1:rain.yes day.1:rain.no day.1:val1 day.2:day.1
[1,] 9.242116 30.973623 0.9311719 1.943265 20.23192 3.8058106
[2,] 7.238372 6.248052 17.4839077 5.251022 11.23247 0.7162231
[3,] 1.046449 11.744293 2.4999295 3.380434 11.31048 4.2160769
day.2:day.2 day.2:rain.yes day.2:rain.no day.2:val1 rain.yes:day.1
[1,] 0.766974 17.576561 9.348420 0.9030936 2.066487
[2,] 4.979445 5.406032 3.905483 5.5516888 8.371235
[3,] 13.735530 1.925034 1.250488 6.5460690 9.214908
rain.yes:day.2 rain.yes:rain.yes rain.yes:rain.no rain.yes:val1
[1,] 10.15267 7.067098 3.527963 13.420953
[2,] 19.75727 22.259788 9.411371 10.040507
[3,] 15.76831 11.416835 8.630324 1.451295
rain.no:day.1 rain.no:day.2 rain.no:rain.yes rain.no:rain.no rain.no:val1
[1,] 4.330075 25.4360600 15.317283 0.349195 12.51062
[2,] 5.495578 11.0861832 11.256991 13.882071 30.86277
[3,] 6.680542 0.2620275 9.630859 36.926827 11.38734
val1:day.1 val1:day.2 val1:rain.yes val1:rain.no val1:val1
[1,] 1.41956168 6.731429 14.124068 12.797966 41.294648
[2,] 3.69760484 6.137335 1.391675 12.639562 1.033024
[3,] 0.08002734 23.743638 6.804015 9.374034 27.107049
我們可以在上面看到, newMat2
包含每一對變量,在分類變量被分成虛擬變量之后。
我要做的是通過對相應列的行求和來將這些虛擬變量重新組合成一個變量。 我最終的 output 將是每對重組變量的矩陣/數據框。
例如,如果我們只看變量day
。 此變量已拆分為day.1
和day.2
。 如果我為每一對重新組合這個變量,我們將有一個列day.day
、 day.rain
和day.val1
。 手動執行此操作可能如下所示:
day.day = apply(newMat2[,c(1,2,6,7)], 1, sum)
day.rain = apply(newMat2[,c(3,4,8,9)], 1, sum)
day.val1 = apply(newMat2[,c(5,10)], 1, sum)
在上面的代碼中,我對應該組合的列進行求和(按行)。
所需的 output:
更明確地說,如果我要手動重新組合整個newMat2
,它看起來像這樣:
dfNew <- data.frame(
day.day = apply(newMat2[,c(1,2,6,7)], 1, sum),
day.rain = apply(newMat2[,c(3,4,8,9)], 1, sum),
day.val1 = apply(newMat2[,c(5,10)], 1, sum),
rain.day = apply(newMat2[,c(11,12,16,17)], 1, sum),
rain.rain = apply(newMat2[,c(13,14,18,19)], 1, sum),
rain.val1 = apply(newMat2[,c(15,20)], 1, sum),
val1.day = apply(newMat2[,c(21,22)], 1, sum),
val1.rain = apply(newMat2[,c(23,24)], 1, sum),
val1.val1 = newMat2[,c(25)]
)
> dfNew
day.day day.rain day.val1 rain.day rain.rain rain.val1 val1.day val1.rain
1 44.78852 29.799418 21.13501 41.98529 26.26154 25.93157 8.150991 26.92203
2 19.18209 32.046444 16.78415 44.71027 56.81022 40.90328 9.834940 14.03124
3 30.74235 9.055886 17.85655 31.92579 66.60485 12.83863 23.823666 16.17805
val1.val1
1 41.294648
2 1.033024
3 27.107049
但是,在我的真實數據中,我有超過 1000 列,其中一些具有許多不同的因子水平,因此,手動組合它們需要很長時間。 有沒有辦法自動化這個過程?
使用tidyverse
函數:
library(tidyverse)
newMat2 %>%
as_tibble(rownames = "id") %>%
pivot_longer(-id) %>%
mutate(name = map_chr(str_extract_all(name, paste(c(colnames(dfOG), ":"), collapse = "|")),
paste0, collapse = "")) %>%
group_by(id, name) %>%
summarise(value = sum(value)) %>%
pivot_wider()
id `day:day` `day:rain` `day:val1` `rain:day` `rain:rain` `rain:val1` `val1:day` `val1:rain` `val1:val1`
1 1 44.8 29.8 21.1 42.0 26.3 25.9 8.15 26.9 41.3
2 2 19.2 32.0 16.8 44.7 56.8 40.9 9.83 14.0 1.03
3 3 30.7 9.06 17.9 31.9 66.6 12.8 23.8 16.2 27.1
library(rlist)
df <- as.data.frame(newMat2)
L <- split.default(df, f = gsub("(^[a-z0-9]+).*:([a-z0-9]+).*$", "\\1.\\2", colnames(df)))
rlist::list.cbind(lapply(L, rowSums))
day.day day.rain day.val1 rain.day rain.rain rain.val1 val1.day val1.rain val1.val1
[1,] 44.78852 29.799418 21.13501 41.98529 26.26154 25.93157 8.150991 26.92203 41.294648
[2,] 19.18209 32.046444 16.78415 44.71027 56.81022 40.90328 9.834940 14.03124 1.033024
[3,] 30.74235 9.055886 17.85655 31.92579 66.60485 12.83863 23.823666 16.17805 27.107049
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.