[英]How do I select the first row in an R data frame that meets certain criteria?
How do I select the first row of an R data frame that meets certain criteria? 如何选择满足特定条件的R数据帧的第一行?
Here is the context: 以下是上下文:
I have a data frame with five columns: 我有一个包含五列的数据框:
"pixel", "year","propvar", "component", "cumsum."
There are 1,225 combinations of pixel
and year
, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. pixel
和year
有1,225种组合,因为数据是根据25个研究年度中每一年的49个地理像素的年度时间序列计算的。 Within each pixel-year, I have computed propvar
, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. 在每个像素年内,我计算了
propvar
,即由给定像素年的时间序列的快速傅里叶变换的给定分量解释的总方差的比例。 I then computed cumsum
, which is the cumulative sum of propvar
for each frequency component within a pixel-year. 然后我计算了
cumsum
,它是像素年内每个频率成分的propvar
的累积和。 The component
column just gives you an index for the Fourier series component (plus 1) from which propvar
was calculated. component
列只是为您提供了计算propvar
的傅里叶级数组件(加1)的索引。
I want to determine the number of components required to explain greater than 99% of the variance. 我想确定解释超过99%的方差所需的组件数量。 I figure one way to do this is to find the first row within each pixel-year where
cumsum
> 0.99, and create a data frame from it with three columns, pixel
, year
, and numbercomps
, where numbercomps
is the number of components required within a given pixel-year to explain greater than 99% of the variance. 我想做到这一点的方法之一是找到每个像素一年内的第一行,其中
cumsum
> 0.99,从它创建一个三列,数据帧pixel
, year
,和numbercomps
,其中numbercomps
是内所需的元件数量一个给定的像素年来解释大于99%的方差。 I do not know how to do this in R. Does anyone have a solution? 我不知道如何在R中做到这一点。有没有人有解决方案?
Sure. 当然。 Something like this should do the trick:
像这样的东西应该做的伎俩:
# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
pixel = c("a", "b", "a", "b", "a"),
cumsum = c(99, 99, 98, 99, 99),
numbercomps=1:5)
df
# year pixel cumsum numbercomps
# 1 2001 a 99 1
# 2 2003 b 99 2
# 3 2001 a 98 3
# 4 2003 b 99 4
# 5 2003 a 99 5
# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res,
subset = !duplicated(res[c("year", "pixel")]),
select = c("pixel", "year", "numbercomps"))
# pixel year numbercomps
# 1 a 2001 1
# 2 b 2003 2
# 5 a 2003 5
EDIT Also, for those interested in data.table
, there is this: 编辑此外,对于那些对
data.table
感兴趣的data.table
,有这样的:
library(data.table)
dt <- data.table(df, key="pixel, year")
dt[cumsum>=99, .SD[1], by=key(dt)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.