简体   繁体   中英

subset columns based on column names

I have a df1 with ids

df1 <- read.table(text="ID
8765
                    1879
                    8706
                    1872
                    0178
                    0268
                    0270
                    0269
                    0061
                    0271", header=T)

second df2 with columns names

> names(df2)
 [1] "TW_3784.IT"   "TW_3970.IT"   "TW_1879.IT"   "TW_0178.IT"   "SF_0271.IT" "TW_3782.IT"  
 [7] "TW_3783.IT"   "TW_8765.IT"   "TW_8706.IT"   "SF_0268.IT" "SF_0270.IT" "SF_0269.IT"
[13] "SF_0061.IT"

What i need is to keep only columns from df2 that partial match with df1

code

using dplyr

df3 = df2 %>% 
  dplyr::select(df2 , dplyr::contains(df1$ID))
error

Error in dplyr::contains(df1$ID) : is_string(match) is not TRUE

using grepl

df3 = df2[,grepl(df1$ID, names(df2))]

error
In grepl(df1$ID, names(df2)) :
  argument 'pattern' has length > 1 and only the first element will be used

As there is a clear pattern in the column names, you can use substr to extract each 4 digit ID. Convert it to a numeric to remove leading zeros. Use which to identify the column numbers that you want to keep.

df2 <- c("TW_3784.IT", "TW_3970.IT", "TW_1879.IT", "TW_0178.IT", "SF_0271.IT", "TW_3782.IT")

numbers <- which(as.numeric(substr(df2, 4, 7)) %in% df1[,1])

Next, you can use these column numbers to subset your dataframe: df[,numbers] .

Here's a solution that uses the dplyr package.

df2 %>% select(matches(paste(df1$ID, collapse = "|")))

This pastes together the ID s from df1 with | as a separator (meaning logical OR ) like this:

"8765|1879|8706|1872|178|268|270|269|61|271"

This is needed as matches then looks for columns names that matches one OR another of these numbers and these columns are then select ed. dplyr is needed for select , matches and also %>% .

In df1 your "text" column is of integer type.

str(df1)
'data.frame':   10 obs. of  1 variable:
 $ ID: int  8765 1879 8706 1872 178 268 270 269 61 271

Convert to a string and the is_string() should return true.

b6$ID <- as.character(b6$ID)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM