如何覆盖R中的一个因子

Question

I have a dataset: 我有一个数据集：

> k
       EVTYPE FATALITIES INJURIES
198704   HEAT        583        0
862634   WIND        158     1150
68670    WIND        116      785
148852   WIND        114      597
355128   HEAT         99        0
67884    WIND         90     1228
46309    WIND         75      270
371112   HEAT         74      135
230927   HEAT         67        0
78567    WIND         57      504

The variables are as follows. 变量如下。 As per the first answer by joran, unused levels can be dropped by droplevels , so no worry about the 898 levels, the illustrative k I'm showing is the complete dataset obtained from k <- d1[1:10, 3:4] where d1 is the original dataset. 根据joran的第一个答案，未使用的级别可以通过droplevels ，因此不必担心898级，我展示的示例性k是从k <- d1[1:10, 3:4] droplevels k <- d1[1:10, 3:4]获得的完整数据集其中d1是原始数据集。

> str(k)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 898 levels "   HIGH SURF ADVISORY",..: 243 NA NA NA 243 NA NA 243 243 NA
 $ FATALITIES: num  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : num  0 1150 785 597 0 ...

I'm trying to overwrite the WIND factor: 我正在尝试覆盖WIND因素：

> k[k$EVTYPE==factor("WIND"), ]$EVTYPE <- factor("AFDAF")
> k[k$EVTYPE=="WIND", ]$EVTYPE <- factor("AFDAF")

But both commands give me error messages: level sets of factors are different or invalid factor level, NA generated . 但是这两个命令都给我错误消息： level sets of factors are different或invalid factor level, NA generated 。

How should I do this? 我应该怎么做？

Answer 1

Try this instead: 尝试以下方法：

k <- droplevels(d1[1:10, 3:5])

Factors (as per the documentation) are simply a vector of integer codes and then a simple vector of labels for each code. 因子（根据文档）仅仅是整数代码的向量，然后是每个代码的标签的向量。 These are called the "levels". 这些被称为“级别”。 The levels are an attribute, and persist with your data even when subsetting. 级别是一个属性，即使进行子设置，也将与您的数据保持一致。

This is a feature , since for many statistical procedures it is vital to keep track of all the possible values that variable could have, even if they don't appear in the actual data. 这是一个功能，因为对于许多统计程序而言，跟踪变量可能具有的所有可能值（即使它们未出现在实际数据中）至关重要。

Some people find this irritation and run R using options(stringsAsFactors = FALSE) . 有人发现这种刺激并使用options(stringsAsFactors = FALSE)运行R。

To simply change the levels, you can do something like this: 要简单地更改级别，您可以执行以下操作：

d <- read.table(text = "      EVTYPE FATALITIES INJURIES
 198704   HEAT        583        0
 862634   WIND        158     1150
 68670    WIND        116      785
 148852   WIND        114      597
 355128   HEAT         99        0
 67884    WIND         90     1228
 46309    WIND         75      270
 371112   HEAT         74      135
 230927   HEAT         67        0
 78567    WIND         57      504",header = TRUE,sep = "",stringsAsFactors = TRUE)
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "HEAT","WIND": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504
> levels(d$EVTYPE) <- c('A','B')
> str(d)
'data.frame':   10 obs. of  3 variables:
 $ EVTYPE    : Factor w/ 2 levels "A","B": 1 2 2 2 1 2 2 1 1 2
 $ FATALITIES: int  583 158 116 114 99 90 75 74 67 57
 $ INJURIES  : int  0 1150 785 597 0 1228 270 135 0 504

Or to just change one: 或只更改一个：

levels(d$EVTYPE)[2] <- 'C'

如何覆盖R中的一个因子

问题描述

1 个解决方案

解决方案1
1 2014-08-21 21:48:04

如何覆盖R中的一个因子

问题描述

1 个解决方案

解决方案1 1 2014-08-21 21:48:04

解决方案1
1 2014-08-21 21:48:04