如何从R中的面板数据框中删除具有唯一ID的行？

Question

I have a data table containing thousands of firms that can be identified by a unique ID. 我有一个数据表，其中包含可以由唯一ID标识的数千个公司。 It is long format data and each firm is supposed to appear twice in different years (cross-sectional time-series over two years). 它是长格式数据，每个公司应该在不同的年份出现两次（两年内的横截面时间序列）。

However, not all firms appear in both years and I am trying to create a balanced long format panel in which only firms remain that appear in both years. 但是，并非所有公司都出现在这两个年度中，因此我试图创建一个平衡的长格式面板，其中仅保留在这两个年度中出现的公司。 How do I accomplish this? 我该如何完成？

This is an example data table to illustrate the issue: 这是一个示例数据表，用于说明问题：

example <- matrix(c(1,1,2,3,3,2013,2016,2013,2013,2016), ncol=2)
colnames(example) <- c('id', 'year')
example.table <- data.table(example)
example.table

   id year
1:  1 2013
2:  1 2016
3:  2 2013
4:  3 2013
5:  3 2016

In the example, I need a code/function that lets me exclude the row of the firm with the id "2", because it has no match in 2016. In other words: I need a code/function that compares each row with the previous & subsequent row and excludes it, if there is no match in the id column. 在该示例中，我需要一个代码/函数，该代码/函数可让我排除ID为“ 2”的公司行，因为它在2016年不匹配。换句话说：我需要一个将每行与如果id列中没有匹配项，则将前一行和后一行排除在外。

I have invested many hours, but appear to have reached the limits of my R knowledge and would appreciate any support. 我已经投入了很多时间，但是似乎已经达到了我的R知识的极限，并且希望得到您的支持。 Thanks! 谢谢！

Answer 1

Using dplyr as below: 使用dplyr如下：

library(dplyr)
example.table %>%
  group_by(id) %>%
  filter(n() > 1)
# A tibble: 4 x 2
# Groups:   id [2]
     id  year
  <dbl> <dbl>
1     1  2013
2     1  2016
3     3  2013
4     3  2016

Answer 2

We create a vector of unique 'year' from the whole dataset, then check if all the values in 'nm1' are %in% the 'year' grouped by 'id' and subset the data.table 我们从整个数据集中创建一个unique 'year'向量，然后检查'nm1'中的all值是否all 'id'分组的'year'的%in% ，并对该数据进行子集化。

un1 <- unique(example.table$year)
example.table[, .SD[all(un1 %in% year)], id]
#   id year
#1:  1 2013
#2:  1 2016
#3:  3 2013
#4:  3 2016

NOTE: The OP's dataset is data.table and the method used is data.table here. 注意：OP的数据集为data.table ，此处使用的方法为data.table 。 Initially, thought about using .SD[uniqueN(year) > 1] , but that is wrong and may not work for all cases 最初，考虑过使用.SD[uniqueN(year) > 1] ，但这是错误的，可能不适用于所有情况

Answer 3

data.table equivalent solution to @Sonny's dplyr solution data.table的dplyr解决方案等效的dplyr解决方案

example.table[, if(.N > 1) .SD, id]

   id year
1:  1 2013
2:  1 2016
3:  3 2013
4:  3 2016

如何从R中的面板数据框中删除具有唯一ID的行？

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-04-11 17:02:39

解决方案2
1 2019-04-11 17:01:40

解决方案3
0 2019-04-11 17:24:19

如何从R中的面板数据框中删除具有唯一ID的行？

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-04-11 17:02:39

解决方案2 1 2019-04-11 17:01:40

解决方案3 0 2019-04-11 17:24:19

解决方案1
2 已采纳 2019-04-11 17:02:39

解决方案2
1 2019-04-11 17:01:40

解决方案3
0 2019-04-11 17:24:19