简体   繁体   English

从数据框 R 中提取某些列

[英]Extract certain columns from data frame R

My data frame looks like this:我的数据框如下所示:

   x s1   s2 s3  s4
1 x1  1 1954  1 yes
2 x2  2 1955  1  no
3 x3  1 1976  2 yes
4 x4  2 1954  2 yes
5 x5  3 1943  1  no

Sample data:样本数据:

df <- data.frame(x=c('x1','x2','x3','x4','x5'),
                    s1=c(1,2,1,2,3),
                    s2=c(1954,1955,1976,1954,1943), 
                    s3=c(1,1,2,2,1),
                    s4=c('yes','no','yes','yes','no'))```

Is it possible to extract the data frame's columns containing integers 1 to 3 ?是否可以提取包含整数13的数据框列? For example, the new data frame would look like:例如,新的数据框如下所示:

 newdf
   x s1 s3
1 x1  1  1
2 x2  2  1
3 x3  1  2
4 x4  2  2
5 x5  3  1

Is it possible to change the s1 and s3 columns to 0 or 1 depending on whether or not the value in the column is 1?是否可以根据列中的值是否为 1 将s1s3列更改为 0 或 1? The altered data frame would then look like:更改后的数据框将如下所示:

newdf2   
 x s1 s3
1 x1  1  1
2 x2  0  1
3 x3  1  0
4 x4  0  0
5 x5  0  1

base R底座 R

newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
#    x s1 s3
# 1 x1  1  1
# 2 x2  2  1
# 3 x3  1  2
# 4 x4  2  2
# 5 x5  3  1

newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
#    x s1 s3
# 1 x1  1  1
# 2 x2  0  1
# 3 x3  1  0
# 4 x4  0  0
# 5 x5  0  1

Walk-through:演练:

  • first, we determine which columns are numbers and contain the numbers 1 or 3:首先,我们确定哪些列是数字并包含数字 1 或 3:

     sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)) # x s1 s2 s3 s4 # FALSE TRUE FALSE TRUE FALSE

    This will exclude any column that is not numeric, meaning that a character column that contains a literal "1" or "3" will not be retained.这将排除任何不是数字的列,这意味着将不会保留包含文字"1""3"character列。 This is complete inference on my end;这是我的完整推断; if you want to accept the string versions then remove the is.numeric(z) component.如果您想接受字符串版本,则删除is.numeric(z)组件。

  • second, we extract the names of those that are true, and prepend "x"其次,我们提取那些真实的名字,并在前面加上"x"

     c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))))) # [1] "x" "s1" "s3"
  • wrap that in unique(.) if, for some reason, "x" is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)如果出于某种原因, "x"也是数字并且包含 1 或 3,则将其包装在unique(.)中(此步骤纯粹是防御性的,您可能并不严格需要它)

  • select those columns, defensively adding drop=FALSE so that if only one column is matched, it still returns a full data.frame select 那些列,防御性地添加drop=FALSE以便如果只有一列匹配,它仍然返回完整的data.frame

  • replace just those columns (excluding the first column which is "x" ) with 0 or 1;仅用 0 或 1 替换那些列(不包括第一列是"x" ); the z == 1 returns logical , and the wrapping +(..) converts logical to 0 (false) or 1 (true). z == 1返回logical ,包装+(..)将 logical 转换为 0 (false) 或 1 (true)。

dplyr dplyr

library(dplyr)
df %>%
  select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
  mutate(across(-x, ~ +(. == 1)))
#    x s1 s3
# 1 x1  1  1
# 2 x2  0  1
# 3 x3  1  0
# 4 x4  0  0
# 5 x5  0  1

I think this is what you expect:我认为这是您所期望的:

my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
             s1=c(1,2,1,2,3),
             s2=c(1954,1955,1976,1954,1943), 
             s3=c(1,1,2,2,1),
             s4=c('yes','no','yes','yes','no'))

my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM