在 R 中選擇唯一的工作場所 - id 組合

Question

考慮我的數據集的以下子集，其中包含大約 22,000 個人。

df<-data.frame( c("Den Haag", "Den Haag", "Den Haag", "Rotterdam", "Den Haag",
                  "Den Haag", "Amsterdam"),
                c("R007", "R007", "R008", "R008", "R008", "R009", "R009"), 
                c(20130101, 20140101 ,20130101, 20130101, 20140101, 20130101, 20140101), 
                c(40000,42000,22000,20000,38000,10000, 15000))

colnames(df)<-c("Gemeente", "id", "Date", "income")

df$Date<-as.character(df$Date)
df$Date<-as.Date(df$Date, "%Y%m%d")

在上面的數據集中，“Gemeente”表示人們工作的地方，id 變量是人。 我的目標是在我的樣本中刪除在 1 個以上工作場所工作的所有觀察結果。 他們是在隨后幾年（R009）還是在同一年（R008）在不同的工作場所工作並不重要。 更准確地說，我也想去掉 2013 年和 2014 年的 R008，因為這個人在 2013 年在兩個城市工作。所以在這種情況下，這意味着我將去掉兩個觀測值 R008 和 R009，只剩下 R007。

我認為我可以通過以下方式做到這一點，但是我對 unique 命令做錯了，它選擇了樣本中的所有唯一 ID，而我只想選擇 R007。 有誰知道我應該使用什么命令？

#Select unique rows of observations based on muncipality and id
library(dplyr)

#Select all unique combinations of Municipality and ids
test<-distinct(df, Gemeente, id))

#Select the number of unique ids (i.e. drop the ids that work at more than one place in our dataset)
#But here I only want to select id R007, but with this command I select all three. So this is where I go wrong.
test2<-as.data.frame(unique(test$id))
colnames(test2)[1]<-"id"
test2$nr<-1

#Use left_join to the initial dataset. 
dffinal<-left_join(df, test2, by = "id")
dffinal<-subset(dffinal, nr ==1)

我很感激任何幫助。

Answer 1

這是否有效：

library(dplyr)
df %>% group_by(id) %>% filter(length(unique(Gemeente)) == 1)
# A tibble: 2 x 4
# Groups:   id [1]
  Gemeente id    Date       income
  <fct>    <fct> <date>      <dbl>
1 Den Haag R007  2013-01-01  40000
2 Den Haag R007  2014-01-01  42000
>

在 R 中選擇唯一的工作場所 - id 組合

問題描述

1 個解決方案

解決方案1
1 已采納 2020-11-02 10:15:42

在 R 中選擇唯一的工作場所 - id 組合

問題描述

1 個解決方案

解決方案1 1 已采納 2020-11-02 10:15:42

解決方案1
1 已采納 2020-11-02 10:15:42