簡體   English   中英

為R中的變量分配唯一編號

[英]Assign Unique number to a variable in R

我有一個csv文件,看起來像這樣:

Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol

我想為每個LocationNormalized變量獲取唯一的ID。 這樣我的

output looks like this:
    Id,Title,FullDescription,LocationRaw,LocationNormalized,ID
    1,hi,abc,def,Bristol,1
    1,yo,abc,def,Bristol,1
    1,was,abc,def,England,2
    1,up,abc,def,India,3
    1,yoh,abc,def,Nepal,4
    1,home,abc,def,Bristol,1

我是R的as.factor 。我嘗試過as.factor和一些腳本失敗。

數據

df <- data.table::fread("Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol")

library(dplyr)

df %>%
  mutate(new_ID = group_indices(., LocationNormalized))

  Id Title FullDescription LocationRaw LocationNormalized new_ID
1  1    hi             abc         def            Bristol      1
2  1    yo             abc         def            Bristol      1
3  1   was             abc         def            England      2
4  1    up             abc         def              India      3
5  1   yoh             abc         def              Nepal      4
6  1  home             abc         def            Bristol      1

使用data.table

library(data.table)
setDT(df1)[, ID := .GRP, by =  LocationNormalized]
df1
#   Id Title FullDescription LocationRaw LocationNormalized ID
#1:  1    hi             abc         def            Bristol  1
#2:  1    yo             abc         def            Bristol  1
#3:  1   was             abc         def            England  2
#4:  1    up             abc         def              India  3
#5:  1   yoh             abc         def              Nepal  4
#6:  1  home             abc         def            Bristol  1

數據

df1 <- structure(list(Id = c(1L, 1L, 1L, 1L, 1L, 1L), Title = c("hi", 
"yo", "was", "up", "yoh", "home"), FullDescription = c("abc", 
"abc", "abc", "abc", "abc", "abc"), LocationRaw = c("def", "def", 
"def", "def", "def", "def"), LocationNormalized = c("Bristol", 
 "Bristol", "England", "India", "Nepal", "Bristol")), .Names = c("Id", 
"Title", "FullDescription", "LocationRaw", "LocationNormalized"
), class = "data.frame", row.names = c(NA, -6L))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM