简体   繁体   English

当列名包含小数点时如何在R中运行回归

[英]how to run regression in R when column name includes decimal points

This may be a very simple problem, but I can't seem to get past it. 这可能是一个非常简单的问题,但我似乎无法克服。 Have column names such as X100.4, X100.-4, X100.-5 so on. 具有诸如X100.4,X100.-4,X100.-5之类的列名。 I'm trying to run a linear regression but when I do this I get an error 我正在尝试进行线性回归,但是当我这样做时出现错误

lm<-lm(X986~X241+X243+X280+X282+X987+X143.2+X239.0+X491.61+X350.-4,data=train)
Error in terms.formula(formula, data = data) : 
  invalid model formula in ExtractVars

it works fine without the variable X350.-4, so I'm assuming it's the problem. 它在没有变量X350.-4的情况下可以正常工作,所以我假设这是问题所在。 I tried doing 'X350.-4' and "X350.-4", but this yielded the same error. 我尝试做“ X350.-4”和“ X350.-4”,但这产生了相同的错误。 I also tried doing "" for all of the variables but this also did not work. 我也尝试对所有变量执行“”,但这也没有用。

You can use backticks: 您可以使用反引号:

DF <- data.frame(x=1:10, y=rnorm(10))
names(DF)[1] <- "x.-1"

lm(y~`x.-1`, data=DF)

But it would be better to sanitize the names: 但是最好对名称进行消毒:

names(DF) <- make.names(names(DF))

The problem is with the minus sign ("-"), not the decimals. 问题在于减号(“-”),而不是小数。 So if you really need these column names, either use @Roland's approach, or replace the minus signs with something else: 因此,如果您确实需要这些列名,请使用@Roland的方法,或将负号替换为其他内容:

colnames(data)=gsub(pattern="-",x=colnames(data),replacement="_")

Using make.names(...) is a little dicey because it can generate collisions (multiple columns with the same name). 使用make.names(...)有点麻烦,因为它会产生冲突(多个具有相同名称的列)。 Consider: 考虑:

DF <- data.frame(y=1:3,x.1=6:8,z=11:13)
colnames(DF)[3] <- "x-1"
DF
  y x.1 x-1
1 1   6  11
2 2   7  12
3 3   8  13

names(DF) <- make.names(names(DF))
DF
  y x.1 x.1
1 1   6  11
2 2   7  12
3 3   8  13

You may need to use: 您可能需要使用:

names(DF) <- make.names(names(DF),unique=T)
DF
  y x.1 x.1.1
1 1   6    11
2 2   7    12
3 3   8    13

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM