简体   繁体   English

R:交叉引用列/反向查找

[英]R: Cross-referencing columns/reverse lookup

I've found a solution for this but suspect there must be a more natural or idiomatic way. 我已经找到了解决方案,但是怀疑必须有一种更自然或惯用的方式。 Given a dataset of many observations over several years at a lot of stations, get a listing by station of the years in which each was active -- should be trivial. 给定一个在许多站点上几年来许多观测值的数据集,按站点获取每个站点活跃的年份的列表-应该是微不足道的。 The data looks roughly like so: 数据大致如下所示:

set.seed(668)
yrNames <- seq(1995,2015)
staNames <- c(LETTERS[1:12])
trpNames <- seq(1,6)
years <- rep(yrNames, times=rep(sample(1:4, length(yrNames), replace=TRUE)))
stations <- sample(staNames, length(years), replace=TRUE)
traps <- sample(trpNames, length(years), replace=TRUE)
data <- data.frame(YEAR=years, STATION=stations, TRAP=traps)

After WAY too many hours (working hard to think vectorwise, avoid loops) I finally worked my way to: 经过WAY多个小时后(努力进行矢量化思考,避免出现循环),我终于按照以下方式工作:

library("reshape2")
bySta <- dcast(data, YEAR ~ STATION)
sapply(bySta, function(x){ return(bySta$YEAR[x > 0])})

Which gives what I wanted: 给出我想要的:

# $YEAR
#  [1] 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
# [16] 2010 2011 2012 2013 2014 2015
# $A
# [1] 2002 2009 2015
# $B
# [1] 1996 1999 2003 2007 2013
# $C
# [1] 2000 2002 2005 2006 2009 2010 2014
# # [...]

But getting there was very far from intuitive, with all kinds of dead ends. 但是,到达那里与直觉相差甚远,并且有各种各样的死胡同。 Is there some way to more simply say "list me all df$x per value of df$y"? 有什么方法可以更简单地说“列出df $ y的每个值df $ x”吗?

An extra wrinkle is that I was starting from a list of per-year dfs created by a 一个额外的皱纹是,我是从一个由

dfList <- lapply(fileList, readDelimFunc)

which I was happier with for other purposes but then for this task the extra organizational layer got me too baffled right away so I mashed them together into one. 我为其他目的而感到高兴,但是对于这项任务,额外的组织层使我立即感到困惑,因此我将它们融合在一起。 Could the desired listing also be (sanely) generated from that list of dfs, or is that ridiculous? 是否可以从该dfs列表中(同时)生成所需的列表,还是那么荒谬?

dplyr solution: dplyr解决方案:

data %>% group_by(STATION) %>% summarize(years = list(unique(YEAR))) %>% as.data.frame

Results: 结果:

   STATION                                    years
1        A                         2002, 2009, 2015
2        B             1996, 1999, 2003, 2007, 2013
3        C 2000, 2002, 2005, 2006, 2009, 2010, 2014
4        D                   2003, 2005, 2010, 2014
5        E                               1997, 2005
6        F       1996, 1997, 1998, 2001, 2014, 2015
7        G                               1996, 2001
8        H                         1995, 1997, 2003
9        I                         1996, 1997, 2008
10       J                         1999, 2001, 2009
11       K             2003, 2004, 2010, 2011, 2012
12       L                   2002, 2004, 2011, 2015

Note that Xapply loops are not actually "vectorized", they are just wrappers around iterations of normal R function calls. 请注意,Xapply循环实际上并不是“向量化的”,它们只是普通R函数调用的迭代的包装器。 (Neither is this dplyr solution "vectorized"). (这个dplyr解决方案也不是“矢量化的”)。

It's best not to get hung up on finding the most optimal solution, and rather finding the most sensical solution. 最好不要挂在寻找最佳解决方案上,而是寻找最明智的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM