為R中的變量分配唯一編號

Question

我有一個csv文件，看起來像這樣：

Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol

我想為每個LocationNormalized變量獲取唯一的ID。 這樣我的

output looks like this:
    Id,Title,FullDescription,LocationRaw,LocationNormalized,ID
    1,hi,abc,def,Bristol,1
    1,yo,abc,def,Bristol,1
    1,was,abc,def,England,2
    1,up,abc,def,India,3
    1,yoh,abc,def,Nepal,4
    1,home,abc,def,Bristol,1

我是R的as.factor 。我嘗試過as.factor和一些腳本失敗。

Answer 1

數據

df <- data.table::fread("Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol")

解

library(dplyr)

df %>%
  mutate(new_ID = group_indices(., LocationNormalized))

  Id Title FullDescription LocationRaw LocationNormalized new_ID
1  1    hi             abc         def            Bristol      1
2  1    yo             abc         def            Bristol      1
3  1   was             abc         def            England      2
4  1    up             abc         def              India      3
5  1   yoh             abc         def              Nepal      4
6  1  home             abc         def            Bristol      1

Answer 2

使用data.table

library(data.table)
setDT(df1)[, ID := .GRP, by =  LocationNormalized]
df1
#   Id Title FullDescription LocationRaw LocationNormalized ID
#1:  1    hi             abc         def            Bristol  1
#2:  1    yo             abc         def            Bristol  1
#3:  1   was             abc         def            England  2
#4:  1    up             abc         def              India  3
#5:  1   yoh             abc         def              Nepal  4
#6:  1  home             abc         def            Bristol  1

數據

df1 <- structure(list(Id = c(1L, 1L, 1L, 1L, 1L, 1L), Title = c("hi", 
"yo", "was", "up", "yoh", "home"), FullDescription = c("abc", 
"abc", "abc", "abc", "abc", "abc"), LocationRaw = c("def", "def", 
"def", "def", "def", "def"), LocationNormalized = c("Bristol", 
 "Bristol", "England", "India", "Nepal", "Bristol")), .Names = c("Id", 
"Title", "FullDescription", "LocationRaw", "LocationNormalized"
), class = "data.frame", row.names = c(NA, -6L))

為R中的變量分配唯一編號

問題描述

2 個解決方案

解決方案1
3 已采納 2018-07-08 23:38:46

數據

解

解決方案2
1 2018-07-09 00:54:07

數據

為R中的變量分配唯一編號

問題描述

2 個解決方案

解決方案1 3 已采納 2018-07-08 23:38:46

數據

解

解決方案2 1 2018-07-09 00:54:07

數據

解決方案1
3 已采納 2018-07-08 23:38:46

解決方案2
1 2018-07-09 00:54:07