简体   繁体   English

在data.table中使用gsub()

[英]Using gsub() in a data.table

I have a big data table (about 20,000 rows). 我有一个大数据表(约20,000行)。 One of its columns contains in integers from 1 to 6. 它的一列包含1到6之间的整数。

I also have a character vector of car models (6 models). 我也有汽车模型(6个模型)的特征向量。

I'm trying to replace integers with corresponding car model.(just 2 in this example) 我正在尝试用相应的汽车模型替换整数。(在此示例中仅为2)

 gsub("1",paste0(labels[1]),Models)
 gsub("2",paste0(labels[2]),Models) 
 ...  

"Models" is the name of a column. “模型”是列的名称。

labels <- c("Altima","Maxima")

After fighting with it for 12+ hours gsub() isn't working( 与它战斗了12个多小时后,gsub()无法正常工作(

sample data: 样本数据:
mydata<-data.table(replicate(1,sample(1:6,10000,rep=TRUE))) labels<-c("altima","maxima","sentra","is","gs","ls") mydata <-data.table(replicate(1,sample(1:6,10000,rep = TRUE)))标签<-c(“ altima”,“ maxima”,“ sentra”,“ is”,“ gs”, “ ls”)

I don't think you need gsub here. 我认为您在这里不需要gsub What you are describing is a factor variable. 您所描述的是一个因子变量。

If you data is 如果你的数据是

mydata <- data.table(replicate(1,sample(1:6,1000,rep=TRUE)))
models <- c("altima","maxima","sentra","is","gs","ls")

you could just do 你可以做

mydata[[1]] <- factor(mydata[[1]], levels=seq_along(models), labels=models)

If you really wanted a character rather than a factor, then 如果您真的想要一个角色而不是一个因素,那么

mydata[[1]] <- models[ mydata[[1]] ]

would also do the trick. 也可以解决问题。 Both of these require the numbers are continuous and start at 1. 这两个都要求数字是连续的,并且从1开始。

You could try using factor() in the following way - worked for me on your test data. 您可以尝试通过以下方式使用factor()-对您的测试数据有用。 Assuming that name of the first column in mydata is V1 (the default) 假设mydata中第一列的名称为V1(默认值)

mydata$V1 <- factor(mydata$V1, labels=models)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM