簡體   English   中英

R 僅保留名稱結尾與另一列中的字符串匹配的列

[英]R keep only columns with name endings that match string in another column

我有一個數據框,我只想保留名稱結尾與特定其他列中字符串條目結尾匹配的那些列。

我當前的數據框:

df <- structure(list(t1copeact_1_1_1 = structure(c(NA, 3, 4, NA, 4, 
NA, 3, 4, 4, NA, NA, NA, 4, 4, 4, NA, 4, NA, NA, 3), display_width = 0L), 
    t1copeact_1_1_2 = structure(c(NA, NA, NA, NA, NA, 4, 3, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L), 
    t1copeact_1_1_3 = structure(c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA), display_width = 0L), 
    t1copeact_1_1_4 = structure(c(NA, NA, NA, NA, 4, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L), 
    t1copeact_1_1_5 = structure(c(NA, NA, 4, NA, NA, 4, NA, NA, 
    NA, NA, 3, NA, NA, 3, NA, NA, NA, NA, NA, 4), display_width = 0L), 
    t1copeact_1_1_6 = structure(c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 2, 3, NA, NA, NA, 2, 4, 3), display_width = 0L), 
    t1copeact_1_1_7 = structure(c(NA, NA, NA, NA, 4, NA, NA, 
    NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L), 
    t1copeact_1_1_8 = structure(c(NA, NA, NA, 3, 4, NA, NA, NA, 
    NA, 3, 4, 4, NA, NA, NA, NA, NA, NA, NA, 3), display_width = 0L), 
    t1copeplan_1_1_1 = structure(c(NA, 4, 4, NA, 4, NA, 3, 4, 
    4, NA, NA, NA, 4, 4, 3, NA, 4, NA, NA, 4), display_width = 0L), 
    t1copeplan_1_1_2 = structure(c(NA, NA, NA, NA, NA, 4, 3, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L), 
    t1copeplan_1_1_3 = structure(c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA, NA, NA), display_width = 0L), 
    t1copeplan_1_1_4 = structure(c(NA, NA, NA, NA, 4, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L), 
    t1copeplan_1_1_5 = structure(c(NA, NA, 4, NA, NA, 4, NA, 
    NA, NA, NA, 2, NA, NA, 4, NA, NA, NA, NA, NA, 4), display_width = 0L), 
    t1copeplan_1_1_6 = structure(c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 2, 4, NA, NA, NA, 3, 4, 4), display_width = 0L), 
    t1copeplan_1_1_7 = structure(c(NA, NA, NA, NA, 4, NA, NA, 
    NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), display_width = 0L), 
    t1copeplan_1_1_8 = structure(c(NA, NA, NA, 4, 4, NA, NA, 
    NA, NA, 4, 3, 2, NA, NA, NA, NA, NA, NA, NA, 3), display_width = 0L), 
    max_sev = structure(c(2L, 2L, 1L, 4L, 4L, 2L, 1L, 2L, 2L, 
    2L, 4L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 4L), .Label = c("T1Severity_1", 
    "T1Severity_5", "T1Severity_6", "T1Severity_8"), class = "factor")), row.names = c(1L, 
2L, 5L, 12L, 13L, 14L, 16L, 19L, 23L, 25L, 27L, 30L, 31L, 32L, 
34L, 35L, 36L, 37L, 39L, 40L), class = "data.frame")

所以我想得到的是:

output_df <- structure(list(copeact = c(NA, NA, 4, 3, 4, 4, 3, NA, NA, NA), 
    copeplan = c(NA, NA, 4, 4, 4, 4, 3, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L))

如您所見,在output_df "copeact" 列中,是 df "t1copeact_1_1_x" 列中與 df "max_sev" 列中字符串結尾匹配的值。

例如,df“max_sev”列第一行的結尾是“ 5 ”,所以我想保留 t1copeact_1_1_5 (=NA) 和 t1copeplan_1_1_5 (=NA) 列的值(沒有別的)。 的“max_sev”當結尾是“_ 8”,我想保持t1copeact_1_1_8和t1copeplan_1_1_8的價值觀等。

有人可以給我一個關於如何解決這個問題的提示嗎? 我真的有點迷失在這里。 如果您需要更多信息,請告訴我。 謝謝!

這是一個tidyverse解決方案:

library(tidyverse)

df %>%
  pivot_longer(cols = -max_sev, names_to = c(".value", "index"), names_pattern = "t1(\\w+)_1_1_(\\d$)") %>%
  filter(str_extract(as.character(max_sev), "\\d$") == index)

首先,您使用pivot_longer將數據轉換為long格式:這樣做的好處是列結尾現在位於名為index的單獨列中。 然后,您只需使用filter

   max_sev      index copeact copeplan
   <fct>        <chr>   <dbl>    <dbl>
 1 T1Severity_5 5          NA       NA
 2 T1Severity_5 5          NA       NA
 3 T1Severity_1 1           4        4
 4 T1Severity_8 8           3        4
 5 T1Severity_8 8           4        4
 6 T1Severity_5 5           4        4
 7 T1Severity_1 1           3        3
 8 T1Severity_5 5          NA       NA
 9 T1Severity_5 5          NA       NA
10 T1Severity_5 5          NA       NA
11 T1Severity_8 8           4        3
12 T1Severity_5 5          NA       NA
13 T1Severity_5 5          NA       NA
14 T1Severity_1 1           4        4
15 T1Severity_1 1           4        3
16 T1Severity_5 5          NA       NA
17 T1Severity_1 1           4        4
18 T1Severity_5 5          NA       NA
19 T1Severity_5 5          NA       NA
20 T1Severity_8 8           3        3

這是一個基本的 R 選項:

#Unique group value
cols <- c('copeact', 'copeplan')
#get the number from max_sev
vals <- sub('.*_', '', df$max_sev)

#Create a row/column matrix to subset the data 
#from the relevant column in the group
sapply(cols, function(x) {
  as.numeric(df[cbind(1:nrow(df), 
              match(sprintf('t1%s_1_1_%s',x, vals), names(df)))])
})

#      copeact copeplan
# [1,]      NA       NA
# [2,]      NA       NA
# [3,]       4        4
# [4,]       3        4
# [5,]       4        4
# [6,]       4        4
# [7,]       3        3
# [8,]      NA       NA
# [9,]      NA       NA
#[10,]      NA       NA
#[11,]       4        3
#[12,]      NA       NA
#[13,]      NA       NA
#[14,]       4        4
#[15,]       4        3
#[16,]      NA       NA
#[17,]       4        4
#[18,]      NA       NA
#[19,]      NA       NA
#[20,]       3        3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM