[英]Assign Unique number to a variable in R
我有一個csv文件,看起來像這樣:
Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol
我想為每個LocationNormalized
變量獲取唯一的ID。 這樣我的
output looks like this:
Id,Title,FullDescription,LocationRaw,LocationNormalized,ID
1,hi,abc,def,Bristol,1
1,yo,abc,def,Bristol,1
1,was,abc,def,England,2
1,up,abc,def,India,3
1,yoh,abc,def,Nepal,4
1,home,abc,def,Bristol,1
我是R的as.factor
。我嘗試過as.factor
和一些腳本失敗。
df <- data.table::fread("Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol")
library(dplyr)
df %>%
mutate(new_ID = group_indices(., LocationNormalized))
Id Title FullDescription LocationRaw LocationNormalized new_ID
1 1 hi abc def Bristol 1
2 1 yo abc def Bristol 1
3 1 was abc def England 2
4 1 up abc def India 3
5 1 yoh abc def Nepal 4
6 1 home abc def Bristol 1
使用data.table
library(data.table)
setDT(df1)[, ID := .GRP, by = LocationNormalized]
df1
# Id Title FullDescription LocationRaw LocationNormalized ID
#1: 1 hi abc def Bristol 1
#2: 1 yo abc def Bristol 1
#3: 1 was abc def England 2
#4: 1 up abc def India 3
#5: 1 yoh abc def Nepal 4
#6: 1 home abc def Bristol 1
df1 <- structure(list(Id = c(1L, 1L, 1L, 1L, 1L, 1L), Title = c("hi",
"yo", "was", "up", "yoh", "home"), FullDescription = c("abc",
"abc", "abc", "abc", "abc", "abc"), LocationRaw = c("def", "def",
"def", "def", "def", "def"), LocationNormalized = c("Bristol",
"Bristol", "England", "India", "Nepal", "Bristol")), .Names = c("Id",
"Title", "FullDescription", "LocationRaw", "LocationNormalized"
), class = "data.frame", row.names = c(NA, -6L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.