简体   繁体   English

根据 R 中另一列的值为列分配随机值

[英]Assign random values to column according to another column's values in R

I have a dataset that has Stock Codes with the range from 2-90214 (which has around 3000 unique values).我有一个数据集,其股票代码范围为 2-90214(大约有 3000 个唯一值)。 Obviously, some values between 2 and 90214 are getting skipped.显然,2 到 90214 之间的某些值被跳过了。 I want to convert these stock codes so that they range from 1-3000 and in such a way that if the previous stock code was 1234, then everytime this number occurs, the new stock code (say 100) will be assigned.我想转换这些股票代码,使它们的范围在 1-3000 之间,如果以前的股票代码是 1234,那么每次出现这个数字时,都会分配新的股票代码(比如 100)。

In short, I want to convert :简而言之,我想转换:

Stock_Code
 1234
 5678
 4321
 1234
 5678

into :进入 :

Stock_Code
 100
 101
 102
 100
 101

How do I do this in R ?我如何在 R 中做到这一点?

We can convert the numbers into factor and then transform it into numeric我们可以将数字转换为因子,然后将其转换为数字

as.numeric(factor(df$StockCode))

#[1] 1 3 2 1 3

If we need it starting from 100 we can add 99 in it如果我们需要从 100 开始,我们可以在其中添加 99

as.numeric(factor(df$StockCode)) + 99

Same numbers would get same factor level which upon converting into numeric would give same numeric value相同的数字将获得相同的因子水平,转换为数字后将给出相同的数值

We can use match to get the index of the unique values, and then add 99我们可以使用match来获取唯一值的索引,然后添加 99

df1$Stock_Code <- match(df1$Stock_Code, unique(df1$Stock_Code)) + 99
df1$Stock_Code
[1] 100 101 102 100 101

Or another option is to convert to factor and coerce to integer或者另一种选择是转换为factor并强制为integer

with(df1, as.integer(factor(Stock_Code, levels = unique(Stock_Code)))+ 99)
#[1] 100 101 102 100 101

Using dplyr使用dplyr

library(dplyr)
dense_rank(df$Stock_Code) + 99

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM