从数据框 R 中提取某些列

Question

My data frame looks like this:我的数据框如下所示：

   x s1   s2 s3  s4
1 x1  1 1954  1 yes
2 x2  2 1955  1  no
3 x3  1 1976  2 yes
4 x4  2 1954  2 yes
5 x5  3 1943  1  no

Sample data:样本数据：

df <- data.frame(x=c('x1','x2','x3','x4','x5'),
                    s1=c(1,2,1,2,3),
                    s2=c(1954,1955,1976,1954,1943), 
                    s3=c(1,1,2,2,1),
                    s4=c('yes','no','yes','yes','no'))```

Is it possible to extract the data frame's columns containing integers 1 to 3 ?是否可以提取包含整数1到3的数据框列？ For example, the new data frame would look like:例如，新的数据框如下所示：

Is it possible to change the s1 and s3 columns to 0 or 1 depending on whether or not the value in the column is 1?是否可以根据列中的值是否为 1 将s1和s3列更改为 0 或 1？ The altered data frame would then look like:更改后的数据框将如下所示：

Answer 1

base R底座 R

newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
#    x s1 s3
# 1 x1  1  1
# 2 x2  2  1
# 3 x3  1  2
# 4 x4  2  2
# 5 x5  3  1

newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
#    x s1 s3
# 1 x1  1  1
# 2 x2  0  1
# 3 x3  1  0
# 4 x4  0  0
# 5 x5  0  1

Walk-through:演练：

first, we determine which columns are numbers and contain the numbers 1 or 3:首先，我们确定哪些列是数字并包含数字 1 或 3：
```
 sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)) # x s1 s2 s3 s4 # FALSE TRUE FALSE TRUE FALSE
```
This will exclude any column that is not numeric, meaning that a character column that contains a literal "1" or "3" will not be retained.这将排除任何不是数字的列，这意味着将不会保留包含文字"1"或"3"的character列。 This is complete inference on my end;这是我的完整推断； if you want to accept the string versions then remove the is.numeric(z) component.如果您想接受字符串版本，则删除is.numeric(z)组件。
second, we extract the names of those that are true, and prepend "x"其次，我们提取那些真实的名字，并在前面加上"x"
```
 c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))))) # [1] "x" "s1" "s3"
```
wrap that in unique(.) if, for some reason, "x" is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)如果出于某种原因， "x"也是数字并且包含 1 或 3，则将其包装在unique(.)中（此步骤纯粹是防御性的，您可能并不严格需要它）
select those columns, defensively adding drop=FALSE so that if only one column is matched, it still returns a full data.frame select 那些列，防御性地添加drop=FALSE以便如果只有一列匹配，它仍然返回完整的data.frame
replace just those columns (excluding the first column which is "x" ) with 0 or 1;仅用 0 或 1 替换那些列（不包括第一列是"x" ）； the z == 1 returns logical , and the wrapping +(..) converts logical to 0 (false) or 1 (true). z == 1返回logical ，包装+(..)将 logical 转换为 0 (false) 或 1 (true)。

dplyr dplyr

library(dplyr)
df %>%
  select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
  mutate(across(-x, ~ +(. == 1)))
#    x s1 s3
# 1 x1  1  1
# 2 x2  0  1
# 3 x3  1  0
# 4 x4  0  0
# 5 x5  0  1

Answer 2

I think this is what you expect:我认为这是您所期望的：

my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
             s1=c(1,2,1,2,3),
             s2=c(1954,1955,1976,1954,1943), 
             s3=c(1,1,2,2,1),
             s4=c('yes','no','yes','yes','no'))

my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]

从数据框 R 中提取某些列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-12-08 14:21:40

base R底座 R

dplyr dplyr

解决方案2
0 2021-12-08 14:01:40

从数据框 R 中提取某些列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-12-08 14:21:40

base R底座 R

dplyr dplyr

解决方案2 0 2021-12-08 14:01:40

解决方案1
2 已采纳 2021-12-08 14:21:40

解决方案2
0 2021-12-08 14:01:40