![](/img/trans.png)
[英]Carrying forward last observation with a limit, by group, using data.table
[英]Average number of observation for group conditional on second identifier in data.table
我有一個具有以下結構的數據框
V1 V2
X1 Y1
X1 Y2
X2 Y2
X2 Y2
X3 Y4
X3 Y5
X3 Y5
X4 Y6
X4 Y6
X4 Y6
我想計算之間對的平均數V1
和V2
為V1
在data.table
。 最終結果應如下所示,並包含觀察到的 V1 對的平均數。
V1 avg.num.pairs
X1 1.0
X2 2.0
X3 1.5
X4 3.0
這個問題有什么優雅的解決方案嗎?
一種dplyr
選項可能是:
df %>%
group_by(V1) %>%
summarise(V2 = n()/n_distinct(V2))
V1 V2
<chr> <dbl>
1 X1 1
2 X2 2
3 X3 1.5
4 X4 3
與data.table
相同:
setDT(df)[, .(V2 = .N/uniqueN(V2)), by = V1]
一個簡單的基本 R 解決方案是使用aggregate
,即,
dfout <- aggregate(V2 ~ V1,df, function(x) length(x)/length(unique(x)))
以至於
> dfout
V1 V2
1 X1 1.0
2 X2 2.0
3 X3 1.5
4 X4 3.0
數據
df <- structure(list(V1 = c("X1", "X1", "X2", "X2", "X3", "X3", "X3",
"X4", "X4", "X4"), V2 = c("Y1", "Y2", "Y2", "Y2", "Y4", "Y5",
"Y5", "Y6", "Y6", "Y6")), class = "data.frame", row.names = c(NA,
-10L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.