[英]R keep only columns with name endings that match string in another column
我有一個數據框,我只想保留名稱結尾與特定其他列中字符串條目結尾匹配的那些列。
我當前的數據框:
df <- structure(list(t1copeact_1_1_1 = structure(c(NA, 3, 4, NA, 4,
NA, 3, 4, 4, NA, NA, NA, 4, 4, 4, NA, 4, NA, NA, 3), display_width = 0L),
t1copeact_1_1_2 = structure(c(NA, NA, NA, NA, NA, 4, 3, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L),
t1copeact_1_1_3 = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA), display_width = 0L),
t1copeact_1_1_4 = structure(c(NA, NA, NA, NA, 4, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L),
t1copeact_1_1_5 = structure(c(NA, NA, 4, NA, NA, 4, NA, NA,
NA, NA, 3, NA, NA, 3, NA, NA, NA, NA, NA, 4), display_width = 0L),
t1copeact_1_1_6 = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 2, 3, NA, NA, NA, 2, 4, 3), display_width = 0L),
t1copeact_1_1_7 = structure(c(NA, NA, NA, NA, 4, NA, NA,
NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L),
t1copeact_1_1_8 = structure(c(NA, NA, NA, 3, 4, NA, NA, NA,
NA, 3, 4, 4, NA, NA, NA, NA, NA, NA, NA, 3), display_width = 0L),
t1copeplan_1_1_1 = structure(c(NA, 4, 4, NA, 4, NA, 3, 4,
4, NA, NA, NA, 4, 4, 3, NA, 4, NA, NA, 4), display_width = 0L),
t1copeplan_1_1_2 = structure(c(NA, NA, NA, NA, NA, 4, 3,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L),
t1copeplan_1_1_3 = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA, NA, NA), display_width = 0L),
t1copeplan_1_1_4 = structure(c(NA, NA, NA, NA, 4, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L),
t1copeplan_1_1_5 = structure(c(NA, NA, 4, NA, NA, 4, NA,
NA, NA, NA, 2, NA, NA, 4, NA, NA, NA, NA, NA, 4), display_width = 0L),
t1copeplan_1_1_6 = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 2, 4, NA, NA, NA, 3, 4, 4), display_width = 0L),
t1copeplan_1_1_7 = structure(c(NA, NA, NA, NA, 4, NA, NA,
NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L),
t1copeplan_1_1_8 = structure(c(NA, NA, NA, 4, 4, NA, NA,
NA, NA, 4, 3, 2, NA, NA, NA, NA, NA, NA, NA, 3), display_width = 0L),
max_sev = structure(c(2L, 2L, 1L, 4L, 4L, 2L, 1L, 2L, 2L,
2L, 4L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 4L), .Label = c("T1Severity_1",
"T1Severity_5", "T1Severity_6", "T1Severity_8"), class = "factor")), row.names = c(1L,
2L, 5L, 12L, 13L, 14L, 16L, 19L, 23L, 25L, 27L, 30L, 31L, 32L,
34L, 35L, 36L, 37L, 39L, 40L), class = "data.frame")
所以我想得到的是:
output_df <- structure(list(copeact = c(NA, NA, 4, 3, 4, 4, 3, NA, NA, NA),
copeplan = c(NA, NA, 4, 4, 4, 4, 3, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L))
如您所見,在output_df "copeact" 列中,是 df "t1copeact_1_1_x" 列中與 df "max_sev" 列中字符串結尾匹配的值。
例如,df“max_sev”列第一行的結尾是“ 5 ”,所以我想保留 t1copeact_1_1_5 (=NA) 和 t1copeplan_1_1_5 (=NA) 列的值(沒有別的)。 的“max_sev”當結尾是“_ 8”,我想保持t1copeact_1_1_8和t1copeplan_1_1_8的價值觀等。
有人可以給我一個關於如何解決這個問題的提示嗎? 我真的有點迷失在這里。 如果您需要更多信息,請告訴我。 謝謝!
這是一個tidyverse
解決方案:
library(tidyverse)
df %>%
pivot_longer(cols = -max_sev, names_to = c(".value", "index"), names_pattern = "t1(\\w+)_1_1_(\\d$)") %>%
filter(str_extract(as.character(max_sev), "\\d$") == index)
首先,您使用pivot_longer
將數據轉換為long
格式:這樣做的好處是列結尾現在位於名為index
的單獨列中。 然后,您只需使用filter
。
max_sev index copeact copeplan
<fct> <chr> <dbl> <dbl>
1 T1Severity_5 5 NA NA
2 T1Severity_5 5 NA NA
3 T1Severity_1 1 4 4
4 T1Severity_8 8 3 4
5 T1Severity_8 8 4 4
6 T1Severity_5 5 4 4
7 T1Severity_1 1 3 3
8 T1Severity_5 5 NA NA
9 T1Severity_5 5 NA NA
10 T1Severity_5 5 NA NA
11 T1Severity_8 8 4 3
12 T1Severity_5 5 NA NA
13 T1Severity_5 5 NA NA
14 T1Severity_1 1 4 4
15 T1Severity_1 1 4 3
16 T1Severity_5 5 NA NA
17 T1Severity_1 1 4 4
18 T1Severity_5 5 NA NA
19 T1Severity_5 5 NA NA
20 T1Severity_8 8 3 3
這是一個基本的 R 選項:
#Unique group value
cols <- c('copeact', 'copeplan')
#get the number from max_sev
vals <- sub('.*_', '', df$max_sev)
#Create a row/column matrix to subset the data
#from the relevant column in the group
sapply(cols, function(x) {
as.numeric(df[cbind(1:nrow(df),
match(sprintf('t1%s_1_1_%s',x, vals), names(df)))])
})
# copeact copeplan
# [1,] NA NA
# [2,] NA NA
# [3,] 4 4
# [4,] 3 4
# [5,] 4 4
# [6,] 4 4
# [7,] 3 3
# [8,] NA NA
# [9,] NA NA
#[10,] NA NA
#[11,] 4 3
#[12,] NA NA
#[13,] NA NA
#[14,] 4 4
#[15,] 4 3
#[16,] NA NA
#[17,] 4 4
#[18,] NA NA
#[19,] NA NA
#[20,] 3 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.