根据 R 中另一列的值为列分配随机值

Question

I have a dataset that has Stock Codes with the range from 2-90214 (which has around 3000 unique values).我有一个数据集，其股票代码范围为 2-90214（大约有 3000 个唯一值）。 Obviously, some values between 2 and 90214 are getting skipped.显然，2 到 90214 之间的某些值被跳过了。 I want to convert these stock codes so that they range from 1-3000 and in such a way that if the previous stock code was 1234, then everytime this number occurs, the new stock code (say 100) will be assigned.我想转换这些股票代码，使它们的范围在 1-3000 之间，如果以前的股票代码是 1234，那么每次出现这个数字时，都会分配新的股票代码（比如 100）。

In short, I want to convert :简而言之，我想转换：

Stock_Code
 1234
 5678
 4321
 1234
 5678

into :进入：

Stock_Code
 100
 101
 102
 100
 101

How do I do this in R ?我如何在 R 中做到这一点？

Answer 1

We can convert the numbers into factor and then transform it into numeric我们可以将数字转换为因子，然后将其转换为数字

as.numeric(factor(df$StockCode))

#[1] 1 3 2 1 3

If we need it starting from 100 we can add 99 in it如果我们需要从 100 开始，我们可以在其中添加 99

as.numeric(factor(df$StockCode)) + 99

Same numbers would get same factor level which upon converting into numeric would give same numeric value相同的数字将获得相同的因子水平，转换为数字后将给出相同的数值

Answer 2

We can use match to get the index of the unique values, and then add 99我们可以使用match来获取唯一值的索引，然后添加 99

df1$Stock_Code <- match(df1$Stock_Code, unique(df1$Stock_Code)) + 99
df1$Stock_Code
[1] 100 101 102 100 101

Or another option is to convert to factor and coerce to integer或者另一种选择是转换为factor并强制为integer

with(df1, as.integer(factor(Stock_Code, levels = unique(Stock_Code)))+ 99)
#[1] 100 101 102 100 101

Answer 3

Using dplyr使用dplyr

library(dplyr)
dense_rank(df$Stock_Code) + 99

根据 R 中另一列的值为列分配随机值

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-01-19 05:42:35

解决方案2
1 2017-01-19 05:30:31

解决方案3
1 2017-01-19 05:54:18

根据 R 中另一列的值为列分配随机值

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-01-19 05:42:35

解决方案2 1 2017-01-19 05:30:31

解决方案3 1 2017-01-19 05:54:18

解决方案1
2 已采纳 2017-01-19 05:42:35

解决方案2
1 2017-01-19 05:30:31

解决方案3
1 2017-01-19 05:54:18