R：如何交叉数据帧列表和特定列

Question

I am trying to find all matching values in a specific column, in a list of data.frames. 我试图在data.frames列表中找到特定列中的所有匹配值。 However, I keep getting a returned value of character(0) . 但是，我一直得到character(0)的返回值。

I have tried the following: Simple subset (very time consuming) -> eg dat[[i]][[i]] lapply w/ Reduce and intersect (as seen here 我曾尝试以下：简单的子集（非常耗时） - >例如DAT [[I]] [[I]] lapply瓦特/缩小和相交（如图这里

LocA<-data.frame(obs.date=c("2018-01-10","2018-01-14","2018-01-20),
obs.count=c(2,0,1))
LocB<-data.frame(obs.date=c("2018-01-09","2018-01-14","2018-01-20),
obs.count=c(0,3,5))
LocC<-data.frame(obs.date=c("2018-01-12","2018-01-14","2018-01-19"),
obs.count=c(2,0,1))
LocD<-data.frame(obs.date=c("2018-01-11","2018-01-16","2018-01-21"),
obs.count=c(2,0,1))

dfList<-list(LocA,LocB,LocC,LocD)

##List of all dates 

lapply(dfList,'[[',1)
[1]"2018-01-10" "2018-01-14" "2018-01-20" "2018-01-09"...

Attempts (failure) 尝试（失败）

>Reduce(intersect,lapply(dfList,'[[',1))
character (0)

I expect the output of this function to be an output identifying the data.frames that share a common date. 我希望这个函数的输出是一个输出，用于标识共享一个共同日期的data.frames。

*Extra smiles if someone know how to identify shared dates and mutate in to a single data frame where..Col1 = dataframe name, Col2=obs.date,Col3 = obs.count *如果有人知道如何识别共享日期并变异到单个数据框，其中微笑，其中..Col1 =数据帧名称，Col2 = obs.date，Col3 = obs.count

Answer 1

You can first merge all the data frames so you only have one: 您可以先合并所有数据框，这样您就只有一个：

a <- Reduce(function(x, y) merge(x, y, all = TRUE), dfList)

Or you can merge them like this: 或者您可以像这样合并它们：

a <-rbind(LocA,LocB,LocC,LocD)

Afterwards, you can extract all the duplicates: 之后，您可以提取所有重复项：

b <- a[duplicated(a$obs.date), ]

Or if you want to keep all the unique ones and keep the duplicates: 或者，如果您想保留所有唯一的并保留重复项：

c <- a[!duplicated(a$obs.date), ]

Answer 2

If by "intersect" you mean doing an "inner join" or "merging" with a specific column as key, then -- you want to use dplyr::inner_join or merge . 如果通过“交叉”表示使用特定列作为键进行“内部dplyr::inner_join ”或“合并”，那么 - 您希望使用dplyr::inner_join或merge 。

First, between two data.frames: 首先，在两个data.frames之间：

library(dplyr)
inner_join(LocA, LocB, by='obs.date')
# 2 rows
inner_join(LocC, LocD, by='obs.date')
# zero rows

So, not infinite merging. 所以，不是无限融合。

Stack, then count 堆叠，然后计数

We'll combine the data first, then count the occurences. 我们先将数据合并，然后计算出现的次数。 Notice the use of the .id -argument to track where the row originated. 请注意使用.id -argument来跟踪行的起源位置。

library(dplyr)
bind_rows(dfList, .id = 'id') %>%
  add_count(obs.date) %>% 
  filter(n > 1)
# A tibble: 5 x 4
  id    obs.date   obs.count     n
  <chr> <chr>          <dbl> <int>
1 1     2018-01-14         0     3
2 1     2018-01-20         1     2
3 2     2018-01-14         3     3
4 2     2018-01-20         5     2
5 3     2018-01-14         0     3

R：如何交叉数据帧列表和特定列

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-07-31 10:55:04

解决方案2
0 2019-07-31 10:44:24

Stack, then count 堆叠，然后计数

R：如何交叉数据帧列表和特定列

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-07-31 10:55:04

解决方案2 0 2019-07-31 10:44:24

Stack, then count 堆叠，然后计数

解决方案1
1 已采纳 2019-07-31 10:55:04

解决方案2
0 2019-07-31 10:44:24