[英]Recode range multiple columns in r
I cannot find an answer to this specific question. 我无法找到这个具体问题的答案。 I would like to recode multiple character columns into numeric columns.
我想将多个字符列重新编码为数字列。 (It is a hundred columns) But:
(这是一百列)但是:
So, I do not think I can use a range of column indexes. 所以,我认为我不能使用一系列列索引。 However, the columns I wish to recode start with the same column name prefix.
但是,我希望重新编码的列以相同的列名前缀开头。 I would like to recode any "Yes" to 1, "No" to 0, and blanks to NA.
我想将任何“是”重新编码为1,将“否”重新编码为0,并将空白重新编码为NA。
I could do this manually one column at a time with the below code: 我可以使用以下代码一次手动执行此操作:
#Recode columns one at a time
library(car)
#skip ID column
#Skip Date column
df$Q1<-as.numeric(as.character(recode(df$Q1,"NA=NA; 'No'=0; 'Yes'=1; ''=NA")))
df$Q2<-as.numeric(as.character(recode(df$Q2,"NA=NA; 'No'=0; 'Yes'=1; ''=NA")))
#skip Q2.Explanation column
#do the above for a hundred more columns...
But I would like to recode a hundred, specific columns at the same time. 但我想同时重新编写一百个特定列。 Also these columns are separated by columns I do not wish to recode.
这些列也是由我不想重新编码的列分隔的。
My data is below. 我的数据如下。 Not sure what is dput:
不确定什么是dput:
ID<-c(01,02,03,04,05)
Q1<-c("Yes", NA,"", "No",NA)
Q1.Explanation<-c (NA, NA,"","Respondent did not get the correct answer", NA)
Q2<-c("No","Yes","Yes","", NA)
Q2.Explanation <-c("The right answer was not proven", NA, NA, NA, NA)
Q3<-c("", NA, "Yes", NA, NA)
Mydata<-as.data.frame(cbind(ID,Q1,Q1.Explanation, Q2, Q2.Explanation,Q3))
If you know that the columns you want to change always have the same names, just different locations in the table, then you can use regex on the column names to subset, then change the values in the columns with apply()
. 如果您知道要更改的列始终具有相同的名称,只是表中的不同位置,则可以使用列名称上的正则表达式进行子集化,然后使用
apply()
更改列中的值。
your_data[, grep("Q", colnames(your_data))] <- as.data.frame(apply(your_data[, grep("Q", colnames(your_data))],
2,
function(x) recode(x, "NA = NA; 'No' = 0; 'Yes' = 1; '' = NA")))
This should recode all of your columns that begin with "Q" regardless of their location any given month. 这应该重新编码以“Q”开头的所有列,无论它们在给定月份的位置如何。
For data.table
fans I have another solution that also has the advantage of using factors
instead of numeric integers for the recoding so that the meaning of the numeric values is still displayed correctly (improving the readability of your data): 对于
data.table
粉丝,我有另一个解决方案,它还具有使用factors
而不是数字整数进行重新编码的优势,这样数字值的含义仍然可以正确显示 (提高数据的可读性):
library(data.table)
ID<-c(01,02,03,04,05)
Q1<-c("Yes", NA,"", "No",NA)
Q1.Explanation<-c (NA, NA,"","Respondent did not get the correct answer", NA)
Q2<-c("No","Yes","Yes","", NA)
Q2.Explanation <-c("The right answer was not proven", NA, NA, NA, NA)
Q3<-c("", NA, "Yes", NA, NA)
Mydata<-as.data.frame(cbind(ID,Q1,Q1.Explanation, Q2, Q2.Explanation,Q3))
Mydata
# The solution starts here... ----------------------------------------------
setDT(Mydata) # convert data.frame into data.table
# the regular expression selects all column names starting with a "Q" followed by digits until the end
affected.cols <- colnames(Mydata)[grep("^Q\\d+$", colnames(Mydata))]
# convert the columns to factors; trailing square brackets are only added to print the output
Mydata[, (affected.cols) := lapply(affected.cols, function(x) { .SD[, factor(get(x), c("No", "Yes")) ] })] []
str(Mydata) # Columns are encoded as factors ("enumerated types") now, which is an integer internally that has a string label
# Proof: 1 = "No", 2 = "Yes"; the "excluded" parameter of "factor()" caused all other values (mainly empty strings) to be translated into NAs
as.numeric(Mydata$Q1)
Which results in: 结果如下:
> as.numeric(Mydata$Q1)
[1] 2 NA NA 1 NA
> Mydata
ID Q1 Q1.Explanation Q2 Q2.Explanation Q3
1: 1 Yes NA No The right answer was not proven NA
2: 2 NA NA Yes NA NA
3: 3 NA Yes NA Yes
4: 4 No Respondent did not get the correct answer NA NA NA
5: 5 NA NA NA NA NA
The correct translation to the numeric values is due to lucky circumstance that the requested numeric values start with 1 so that the "No" has the level index 1 and "Yes" the level index 2. 对数值的正确转换是由于幸运的情况,所请求的数值以1开始,因此“否”具有级别索引1并且“是”级别索引2。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.