[英]Applying a conditional replace function over every cell in a data frame
[英]Applying a function to every column of a data frame
我需要在多列数据帧中转换数据,并希望找到一种方法在数据帧的所有列中同时进行转换。 数值数据的数据转换似乎没有问题。 例如:
df <- data.frame(
co1 = c(5,9,6,1,6),
co2 = c(8,5,4,6,2),
co3 = c(6,5,4,1,2),
co4 = c(6,1,5,3,2),
co5 = c(5,1,2,6,8))
我可以使用for循环一次转换所有数据(例如,将所有值> 5标记为“是”,将所有其他值标记为“否”):
for(i in 1:ncol(df)){
df[i] <- ifelse(df[i] > 5, "yes", "no")
}
或者,更简单地说,使用指数:
df[] <- ifelse(df > 5, "yes", "no")
但是,当我有字符数据时,这些方法不起作用。 例如,我想将此数据框中以“A”开头的所有值转换为“是”:
df <- data.frame(
co1 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co2 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co3 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co4 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co5 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")))
df
co1 co2 co3 co4 co5
1 JF GB ID EB DF
2 IA DD DA IF HD
3 HI IH JE CH FB
4 GE JI CJ BA GE
5 BG EE GG AJ BH
for循环
for(i in 1:ncol(df)){
df[i] <- ifelse(grepl("^B", df[i]), "yes", "no")
}
以及通过索引的转换产生相同的错误转换:
df[] <- ifelse(grepl("^B", df), "yes", "no")
df
co1 co2 co3 co4 co5
1 no no no no no
2 no no no no no
3 no no no no no
4 no no no no no
5 no no no no no
有关如何使用字符数据实现正确转换的任何帮助?
使用dplyr
,我们可以:
df %>%
mutate_all(function(x) ifelse(grepl("^B",x),"Yes","No"))
co1 co2 co3 co4 co5
1 Yes No Yes No No
2 No No No No No
3 No No No No No
4 No No No No No
5 No No No No Yes
关于帖子中的数据(df1):
df1 %>%
mutate_all(function(x) ifelse(grepl("^B",x),"Yes","No"))
co1 co2 co3 co4 co5
1 No No No No No
2 No No No No No
3 No No No No No
4 No No No Yes No
5 Yes No No No Yes
数据:
df
co1 co2 co3 co4 co5
1 BH IC BC HJ CC
2 CC DH CF GI HI
3 DB GE JI DA GD
4 II CA EJ IG FA
5 JD JB IG EB BE
如果你想坚持基础R, lapply
会在这里工作:
set.seed(123)
df <- data.frame(
co1 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co2 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co3 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co4 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co5 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")))
df2 <- as.data.frame(lapply(df, function(x) ifelse(grepl("^B", x), "yes", "no")))
co1 co2 co3 co4 co5
1 CA JI IH JE BB
2 HE EC GE IG DC
3 DH FA FI FB ID
4 GD IJ JC HC CJ
5 FC AF DA AH AF
co1 co2 co3 co4 co5
1 no no no no yes
2 no no no no no
3 no no no no no
4 no no no no no
5 no no no no no
我们可以unlist
数据,然后直接在基数R中使用grepl
进行索引
df[] <- c("No", "Yes")[grepl("^B", unlist(df)) + 1]
df
# co1 co2 co3 co4 co5
#1 No No No No No
#2 No Yes No No No
#3 No No No Yes No
#4 No No No No No
#5 No No No No Yes
数据
set.seed(12345)
df <- data.frame(
co1 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co2 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co3 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co4 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")),
co5 = c(paste(sample(LETTERS[1:10],5), sample(LETTERS[1:10],5), sep = "")))
df
# co1 co2 co3 co4 co5
#1 HB AE ED HD HD
#2 JC BD CG AH DA
#3 GE FI HE BI JI
#4 IF JB JB EE FH
#5 CG CF DC CA BJ
base R
一个选项,带有substr
out <- array("No", dim = dim(df), dimnames = dimnames(df))
out[substr(as.matrix(df), 1, 1) == "B"] <- "Yes"
df <- structure(list(co1 = structure(c(2L, 4L, 1L, 3L, 5L), .Label = c("BF",
"CH", "EC", "HB", "JJ"), class = "factor"), co2 = structure(c(3L,
1L, 4L, 5L, 2L), .Label = c("AD", "FI", "GA", "HH", "JB"), class = "factor"),
co3 = structure(c(1L, 5L, 4L, 3L, 2L), .Label = c("CJ", "DB",
"EF", "FH", "IG"), class = "factor"), co4 = structure(c(2L,
4L, 3L, 1L, 5L), .Label = c("AE", "DH", "HA", "IF", "JC"), class = "factor"),
co5 = structure(c(1L, 5L, 3L, 2L, 4L), .Label = c("AC", "BG",
"EE", "GI", "JJ"), class = "factor")),
class = "data.frame", row.names = c(NA,
-5L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.