为R中的级别分配新值

Question

all, 所有，

I have a large data set (over 2 million rows), and in one of the columns I have the following levels: 我有一个大数据集（超过200万行），并且在其中一列中，我具有以下级别：

"0"     "0.001" "1"     "4"     "4.001" "8.001"

I want to make a new column where each of those has a new, corresponding letter: 我想创建一个新列，其中每个都有一个新的对应字母：

0 = x, 0.001 = D, 1 = C, 4 and 4.001 = B, and 8.001 = A 0 = x，0.001 = D，1 = C，4和4.001 = B和8.001 = A

Is there a way to do this without using a for loops with 6 if statements? 有没有一种方法可以不使用带有6条if语句的for循环？ I tried that, and it was taking forever to run. 我试过了，这花了很多时间。

Here's a test sample: 这是一个测试样本：

      a b
1 0.000 x
2 4.000 B
3 1.000 C
4 0.001 D
5 1.000 C
6 4.000 B
7 4.001 B
8 1.000 C
9 8.001 A

Thank you. 谢谢。

Answer 1

The easiest way would be to create a key/value dataset and join with the original data 最简单的方法是创建键/值数据集并与原始数据连接

keyval <- data.frame(a = c(0, 0.001, 1, 4, 4.001, 8.001), 
     b = c('x', 'D', 'C', 'B', 'B', 'A'), stringsAsFactors= FALSE)
library(data.table)
setDT(df1)[keyval, b := b, on = .(a)]
df1
#       a b
#1: 0.000 x
#2: 4.000 B
#3: 1.000 C
#4: 0.001 D
#5: 1.000 C
#6: 4.000 B
#7: 4.001 B
#8: 1.000 C
#9: 8.001 A

data 数据

df1 <- structure(list(a = c(0, 4, 1, 0.001, 1, 4, 4.001, 1, 8.001)), 
    .Names = "a", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")

Answer 2

I do not believe there is a single line command that can do it for you. 我不相信有任何一行命令可以为您做到这一点。 BTW for loops by nature are inefficient and not recommended for large data sets. BTW for自然循环效率低，不建议用于大型数据集。

Option 1: 选项1：
What you may want to try is logical indexing which is a statistical implementation of bit array . 您可能想尝试的是logical indexing ，它是位数组的统计实现。

idx<- df$a == "0.000"
df$NewColumn[idx] <- "x"

idx<- df$a == "4.000"
df$NewColumn[idx] <- "B"

and so on and so forth... 等等等等...

Option 2: 选项2：
Use plyr and revalue which is a simpler implementation however could be more compute intensive than option 1. Should still easily work for your data size. 使用plyr和revalue这是一个简单的实现却可能是更多的计算比选择1集约化应该还是很容易为你的数据大小的工作。

library(plyr)
df$NewColumn <- revalue(df$a, c(0 = "x", 0.001 = "D", 1 = "C", 4 = "B", 4.001 = "B", and 8.001 = "A"))

For either option, make sure that the data type class is provided correctly. 对于这两个选项，请确保正确提供了数据类型class 。 From your example, its hard for me to tell if the data is factor or numeric but either ways, its a simple change to manage in my sample code. 从您的示例中，我很难分辨数据是factor还是numeric但是无论哪种方式，这都是在示例代码中进行管理的简单更改。

Answer 3

尝试as.factor（x，等级= c（无论等级和数值由逗号分隔））

Answer 4

I would try this, not shure about the runtime though: 我会尝试这样做，尽管不能保证运行时：

library(forcats)
df = data.frame(a = c("0", "0.001", "1", "4", "4.001", "8.001"))
df$b <- fct_recode(df$a,
               X = "0",
               D = "0.001",
               C = "1",
               B = "4",
               B = "4.001",
               A = "8.001")

为R中的级别分配新值

问题描述

4 个解决方案

解决方案1
2 2017-09-23 03:27:29

data 数据

解决方案2
1 已采纳 2017-09-23 02:21:34

解决方案3
0 2017-09-23 01:58:43

解决方案4
0 2017-09-23 09:52:01

为R中的级别分配新值

问题描述

4 个解决方案

解决方案1 2 2017-09-23 03:27:29

data 数据

解决方案2 1 已采纳 2017-09-23 02:21:34

解决方案3 0 2017-09-23 01:58:43

解决方案4 0 2017-09-23 09:52:01

解决方案1
2 2017-09-23 03:27:29

解决方案2
1 已采纳 2017-09-23 02:21:34

解决方案3
0 2017-09-23 01:58:43

解决方案4
0 2017-09-23 09:52:01