[英]R: How to write it in more efficient, more compact, way the code that repeats using the same kind of variables
I have a data.frame
DAT
in which there 8 columns containing the strings in the following format (these are the multiple choice answers to eight questions of a quiz): 我有一个
data.frame
DAT
,其中有8列包含以下格式的字符串(这些是对测验中八个问题的多项选择答案):
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
1 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
2 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
3 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
4 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
5 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
6 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
I would like to convert it to the following: 我想将其转换为以下内容:
q11 q12 q13 q21 q22 q23 q31 q32 q33 q41 q42 q43 q51 q52 q53 q61 q62 q63 q71 q72 q73 q81 q82 q83
1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
2 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
3 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
4 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
5 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
6 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
So I write the following code: 所以我写了下面的代码:
NAMES.Q = paste(rep("Q",8), c(1:8), sep="")
DAT[ which(DAT[NAMES.Q]=="NULL"),]<- NA # to set to NA skipped questions
NAMES.q = paste(rep("q",8), c(1:8), sep="")
The following code is to convert the strings into 0
and 1
numeric values. 以下代码将字符串转换为
0
和1
数值。
q1 <- read.csv(text = as.character(DAT[,"Q1"]), strip.white = TRUE)
q2 <- read.csv(text = as.character(DAT[,"Q2"]), strip.white = TRUE)
q3 <- read.csv(text = as.character(DAT[,"Q3"]), strip.white = TRUE)
q4 <- read.csv(text = as.character(DAT[,"Q4"]), strip.white = TRUE)
q5 <- read.csv(text = as.character(DAT[,"Q5"]), strip.white = TRUE)
q6 <- read.csv(text = as.character(DAT[,"Q6"]), strip.white = TRUE)
q7 <- read.csv(text = as.character(DAT[,"Q7"]), strip.white = TRUE)
q8 <- read.csv(text = as.character(DAT[,"Q8"]), strip.white = TRUE)
names(q1) = paste("q1", 1:3, sep = "")
names(q2) = paste("q2", 1:3, sep = "")
names(q3) = paste("q3", 1:3, sep = "")
names(q4) = paste("q4", 1:3, sep = "")
names(q5) = paste("q5", 1:3, sep = "")
names(q6) = paste("q6", 1:3, sep = "")
names(q7) = paste("q7", 1:3, sep = "")
names(q8) = paste("q8", 1:3, sep = "")
q1[is.na(q1)] <- 0
q2[is.na(q2)] <- 0
q3[is.na(q3)] <- 0
q4[is.na(q4)] <- 0
q5[is.na(q5)] <- 0
q6[is.na(q6)] <- 0
q7[is.na(q7)] <- 0
q8[is.na(q8)] <- 0
qs<-cbind(q1, q2, q3, q4, q5, q6, q7, q8)
The code works, but I find it very difficult to read and maintain. 该代码有效,但我发现它很难阅读和维护。
Would you suggest a loop or another way of writing this information in my main data.frame
(DAT) without creating a new data.frame
? 您是否建议在不创建新的
data.frame
情况下以循环或其他方式将此信息写入我的主data.frame
(DAT)中?
First, read the data with read.table
. 首先,使用
read.table
读取数据。 The default field separator in read.table
is 'white space', ie the separator between the concatenated "Q" columns. read.table
的默认字段分隔符为“空白”,即串联的“ Q”列之间的分隔符。
Then you may use a function in package splitstackshape
, concat.split.multiple
, to split the concatenated columns. 然后你可以使用在包装功能
splitstackshape
, concat.split.multiple
,分裂级联列。 By not specifying split.cols
, the columns that need to be split, all columns are split. 通过不指定
split.cols
(需要拆分的列),将拆分所有列。 The default separator character ( seps
) used in each column is ,
.The default shape ( direction
) of the resulting data frame is "wide". 默认分隔符(
seps
在每列中使用的)是,
.The默认形状( direction
所产生的数据帧的)是“宽”。 Thus, in this case you only need to supply the name of the data frame. 因此,在这种情况下,您只需要提供数据框的名称即可。
df <- read.table(text=" Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
1 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
2 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
3 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
4 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
5 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
6 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,", header=TRUE)
library(splitstackshape)
# split columns
df2 <- concat.split.multiple(df)
# or explicitly writing out the arguments
df2 <- concat.split.multiple(data = df, split.cols = names(df), seps = ",")
# replace NA with 0
df2[is.na(df2)] <- 0
df2
# Q1_1 Q1_2 Q1_3 Q2_1 Q2_2 Q2_3 Q3_1 Q3_2 Q3_3 Q4_1 Q4_2 Q4_3 Q5_1 Q5_2 Q5_3 Q6_1 Q6_2 Q6_3 Q7_1 Q7_2 Q7_3 Q8_1 Q8_2 Q8_3
# 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
# 2 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
# 3 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
# 4 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
# 5 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
# 6 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0
Use strsplit
instead of read.csv
. 使用
strsplit
而不是read.csv
。 Add some lapply
loops and you are all set. 添加一些
lapply
循环即可,一切lapply
。
DF <- read.table(text=" Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
1 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
2 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
3 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
4 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
5 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,
6 ,,1 ,,1 ,1, 1,, ,,1 ,,1 ,1, 1,,", header=TRUE)
DF2 <- do.call(cbind.data.frame, lapply(DF, function(x) {
res <- strsplit(x, ",")
res <- lapply(res, as.numeric)
res <- do.call(rbind, res)
res[is.na(res)] <- 0
res
}))
# Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q4.1 Q4.2 Q5.1 Q5.2 Q5.3 Q6.1 Q6.2 Q6.3 Q7.1 Q7.2 Q8.1 Q8.2
#1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0
#2 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0
#3 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0
#4 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0
#5 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0
#6 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.