简体   繁体   English

如何覆盖R中的一个因子

[英]How to overwrite a factor in R

I have a dataset: 我有一个数据集:

> k
       EVTYPE FATALITIES INJURIES
198704   HEAT        583        0
862634   WIND        158     1150
68670    WIND        116      785
148852   WIND        114      597
355128   HEAT         99        0
67884    WIND         90     1228
46309    WIND         75      270
371112   HEAT         74      135
230927   HEAT         67        0
78567    WIND         57      504

The variables are as follows. 变量如下。 As per the first answer by joran, unused levels can be dropped by droplevels , so no worry about the 898 levels, the illustrative k I'm showing is the complete dataset obtained from k <- d1[1:10, 3:4] where d1 is the original dataset. 根据joran的第一个答案,未使用的级别可以通过droplevels ,因此不必担心898级,我展示的示例性k是从k <- d1[1:10, 3:4] droplevels k <- d1[1:10, 3:4]获得的完整数据集其中d1是原始数据集。

> str(k)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 898 levels "   HIGH SURF ADVISORY",..: 243 NA NA NA 243 NA NA 243 243 NA
 $ FATALITIES: num  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : num  0 1150 785 597 0 ...

I'm trying to overwrite the WIND factor: 我正在尝试覆盖WIND因素:

> k[k$EVTYPE==factor("WIND"), ]$EVTYPE <- factor("AFDAF")
> k[k$EVTYPE=="WIND", ]$EVTYPE <- factor("AFDAF")

But both commands give me error messages: level sets of factors are different or invalid factor level, NA generated . 但是这两个命令都给我错误消息: level sets of factors are differentinvalid factor level, NA generated

How should I do this? 我应该怎么做?

Try this instead: 尝试以下方法:

k <- droplevels(d1[1:10, 3:5])

Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. 因子(根据文档)仅仅是整数代码的向量,然后是每个代码的标签的向量。 These are called the "levels". 这些被称为“级别”。 The levels are an attribute, and persist with your data even when subsetting. 级别是一个属性,即使进行子设置,也将与您的数据保持一致。

This is a feature , since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data. 这是一个功能 ,因为对于许多统计程序而言,跟踪变量可能具有的所有可能值(即使它们未出现在实际数据中)至关重要。

Some people find this irritation and run R using options(stringsAsFactors = FALSE) . 有人发现这种刺激并使用options(stringsAsFactors = FALSE)运行R。

To simply change the levels, you can do something like this: 要简单地更改级别,您可以执行以下操作:

d <- read.table(text = "      EVTYPE FATALITIES INJURIES
 198704   HEAT        583        0
 862634   WIND        158     1150
 68670    WIND        116      785
 148852   WIND        114      597
 355128   HEAT         99        0
 67884    WIND         90     1228
 46309    WIND         75      270
 371112   HEAT         74      135
 230927   HEAT         67        0
 78567    WIND         57      504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504

Or to just change one: 或只更改一个:

levels(d$EVTYPE)[2] <- 'C'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM