[英]How to properly setup arrange and group by in dplyr?
我的數據結構如下:
Athletes = c("Gus", "Hudson", "Bobby", "Tom")
set.seed(400)
RawData <- data.frame(Name = rep((Athletes), each = 400),
Quarter = as.numeric(rep(1:4, each = 100)),
Sample = as.numeric(rep(1:100, each = 1)),
X = runif(400, 26, 30),
Y = runif(400, 12, 16))
我希望計算每個Quarter
每個Sample
每個Athlete
的每個X和Y對的位移。 為此,我設置了以下代碼:
DistanceOutput <- RawData %>%
arrange(Name, Sample, Quarter) %>%
group_by(Name, Quarter) %>%
mutate( lagX = lag(X, order_by=Sample), lagY = lag(Y, order_by=Sample)) %>%
rowwise() %>%
mutate(Distance = dist( matrix( c(X,Y,lagX,lagY),nrow=2,byrow=TRUE) )) %>%
select(-lagX, -lagY)
但是,這將返回一個data.frame
,其結構如下:
> head(DistanceOutput, n=10)
Source: local data frame [10 x 6]
Name Quarter Sample X Y Distance
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bobby 1 1 27.82656 13.85830 NA
2 Bobby 2 1 27.37298 15.67940 NA
3 Bobby 3 1 28.74274 12.25703 NA
4 Bobby 4 1 26.63564 13.07924 NA
5 Bobby 1 2 26.32446 12.64722 1.929508
6 Bobby 2 2 26.88957 14.52096 NA
7 Bobby 3 2 27.53932 15.57959 3.533781
8 Bobby 4 2 28.03031 12.70763 1.443328
9 Bobby 1 3 29.68239 13.82739 3.559287
10 Bobby 2 3 29.43869 12.60890 3.186531
相反,我希望將數據設置如下:
> head(DistanceOutput, n=3)
Source: local data frame [10 x 6]
Name Quarter Sample X Y Distance
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bobby 1 1 27.82656 13.85830 NA
2 Bobby 1 2 26.32446 12.64722 1.929508
3 Bobby 1 3 29.68239 13.82739 3.559287
如何正確設置group_by並在dplyr
安排語句以正確反映所需的輸出?
謝謝。
我想這是訂單問題
DistanceOutput %>%
arrange(Name, Quarter, Sample) %>%
head(3)
# Name Quarter Sample X Y Distance
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Bobby 1 1 28.40293 15.40195 NA
#2 Bobby 1 2 26.33676 14.32382 2.330544
#3 Bobby 1 3 28.60779 14.67457 2.297951
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.