[英]Using gsub() in a data.table
I have a big data table (about 20,000 rows). 我有一个大数据表(约20,000行)。 One of its columns contains in integers from 1 to 6.
它的一列包含1到6之间的整数。
I also have a character vector of car models (6 models). 我也有汽车模型(6个模型)的特征向量。
I'm trying to replace integers with corresponding car model.(just 2 in this example) 我正在尝试用相应的汽车模型替换整数。(在此示例中仅为2)
gsub("1",paste0(labels[1]),Models)
gsub("2",paste0(labels[2]),Models)
...
"Models" is the name of a column. “模型”是列的名称。
labels <- c("Altima","Maxima")
After fighting with it for 12+ hours gsub() isn't working( 与它战斗了12个多小时后,gsub()无法正常工作(
sample data: 样本数据:
mydata<-data.table(replicate(1,sample(1:6,10000,rep=TRUE))) labels<-c("altima","maxima","sentra","is","gs","ls") mydata <-data.table(replicate(1,sample(1:6,10000,rep = TRUE)))标签<-c(“ altima”,“ maxima”,“ sentra”,“ is”,“ gs”, “ ls”)
I don't think you need gsub
here. 我认为您在这里不需要
gsub
。 What you are describing is a factor variable. 您所描述的是一个因子变量。
If you data is 如果你的数据是
mydata <- data.table(replicate(1,sample(1:6,1000,rep=TRUE)))
models <- c("altima","maxima","sentra","is","gs","ls")
you could just do 你可以做
mydata[[1]] <- factor(mydata[[1]], levels=seq_along(models), labels=models)
If you really wanted a character rather than a factor, then 如果您真的想要一个角色而不是一个因素,那么
mydata[[1]] <- models[ mydata[[1]] ]
would also do the trick. 也可以解决问题。 Both of these require the numbers are continuous and start at 1.
这两个都要求数字是连续的,并且从1开始。
You could try using factor() in the following way - worked for me on your test data. 您可以尝试通过以下方式使用factor()-对您的测试数据有用。 Assuming that name of the first column in mydata is V1 (the default)
假设mydata中第一列的名称为V1(默认值)
mydata$V1 <- factor(mydata$V1, labels=models)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.