[英]Extract certain columns from data frame R
My data frame looks like this:我的数据框如下所示:
x s1 s2 s3 s4
1 x1 1 1954 1 yes
2 x2 2 1955 1 no
3 x3 1 1976 2 yes
4 x4 2 1954 2 yes
5 x5 3 1943 1 no
Sample data:样本数据:
df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))```
Is it possible to extract the data frame's columns containing integers 1
to 3
?是否可以提取包含整数
1
到3
的数据框列? For example, the new data frame would look like:例如,新的数据框如下所示:
newdf
x s1 s3
1 x1 1 1
2 x2 2 1
3 x3 1 2
4 x4 2 2
5 x5 3 1
Is it possible to change the s1
and s3
columns to 0 or 1 depending on whether or not the value in the column is 1?是否可以根据列中的值是否为 1 将
s1
和s3
列更改为 0 或 1? The altered data frame would then look like:更改后的数据框将如下所示:
newdf2
x s1 s3
1 x1 1 1
2 x2 0 1
3 x3 1 0
4 x4 0 0
5 x5 0 1
newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 2 1
# 3 x3 1 2
# 4 x4 2 2
# 5 x5 3 1
newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
Walk-through:演练:
first, we determine which columns are numbers and contain the numbers 1 or 3:首先,我们确定哪些列是数字并包含数字 1 或 3:
sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)) # x s1 s2 s3 s4 # FALSE TRUE FALSE TRUE FALSE
This will exclude any column that is not numeric, meaning that a character
column that contains a literal "1"
or "3"
will not be retained.这将排除任何不是数字的列,这意味着将不会保留包含文字
"1"
或"3"
的character
列。 This is complete inference on my end;这是我的完整推断; if you want to accept the string versions then remove the
is.numeric(z)
component.如果您想接受字符串版本,则删除
is.numeric(z)
组件。
second, we extract the names of those that are true, and prepend "x"
其次,我们提取那些真实的名字,并在前面加上
"x"
c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))))) # [1] "x" "s1" "s3"
wrap that in unique(.)
if, for some reason, "x"
is also numeric and contains 1 or 3 (this step is purely defensive, you may not strictly need it)如果出于某种原因,
"x"
也是数字并且包含 1 或 3,则将其包装在unique(.)
中(此步骤纯粹是防御性的,您可能并不严格需要它)
select those columns, defensively adding drop=FALSE
so that if only one column is matched, it still returns a full data.frame
select 那些列,防御性地添加
drop=FALSE
以便如果只有一列匹配,它仍然返回完整的data.frame
replace just those columns (excluding the first column which is "x"
) with 0 or 1;仅用 0 或 1 替换那些列(不包括第一列是
"x"
); the z == 1
returns logical
, and the wrapping +(..)
converts logical to 0 (false) or 1 (true). z == 1
返回logical
,包装+(..)
将 logical 转换为 0 (false) 或 1 (true)。
library(dplyr)
df %>%
select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
mutate(across(-x, ~ +(. == 1)))
# x s1 s3
# 1 x1 1 1
# 2 x2 0 1
# 3 x3 1 0
# 4 x4 0 0
# 5 x5 0 1
I think this is what you expect:我认为这是您所期望的:
my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
s1=c(1,2,1,2,3),
s2=c(1954,1955,1976,1954,1943),
s3=c(1,1,2,2,1),
s4=c('yes','no','yes','yes','no'))
my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.