R：計算分組數據幀的第一行和當前行之間的距離

Question

我需要計算數據幀中第一行和當前行之間的歐幾里德距離。 每行鍵入（組，月）並具有值列表。 在下面的玩具示例中，鍵是c（月，學生），值在c（A，B）中。 我想創建一個距離列C，它等於sqrt（（A_i-A_1）^ 2 +（B_i-B_1）^ 2）。

到目前為止，我設法傳播我的數據並將每個組的第一個值拉入新列。 雖然我可以在玩具示例中手動創建公式，但在我的實際數據中我有很多列而不是2.我相信我可以在mutate_all中創建平方差異，然后執行行和並取平方根那，但到目前為止沒有運氣。

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 6, 6, 8, 6, 9),
                 B=c(6, 2, 8, 5, 6, 7))

# Pull in each column's first values for each group
df %>% 
  group_by(student) %>% 
  mutate_all(list(first = first)) %>% 
# TODO: Calculate the distance, i.e. SQRT(sum_i[(x_i - x_1)^2]).

#Output:
  month student     A     B month_first A_first B_first
1     1 Amy         9     6           1       9       6
2     2 Amy         6     2           1       9       6
...

期望的輸出：

#Output:
  month student     A     B month_first A_first B_first dist_from_first
1     1 Amy         9     6           1       9       6    0
2     2 Amy         6     2           1       9       6    5
...

Answer 1

這是使用緊湊dplyr代碼的另一種方法。 這可以用於任意數量的列

df %>% 
  select(-month) %>%
  group_by(student) %>% 
  mutate_each(function(x) (first(x) - x)^2) %>%
  ungroup() %>%
  mutate(euc.dist = sqrt(rowSums(select(., -1))))

# A tibble: 6 x 4
  student     A     B euc.dist
  <chr>   <dbl> <dbl>    <dbl>
1 Amy         0     0     0   
2 Amy         9    16     5   
3 Amy         9     4     3.61
4 Bob         0     0     0   
5 Bob         4     1     2.24
6 Bob         1     4     2.24

Answer 2

編輯：使用連接添加替代配方。 我希望對於包含許多列的非常寬的數據幀，這種方法會快得多。

方法1：要獲得大量列的歐幾里德距離，一種方法是重新排列數據，使每行顯示一個月，一個學生和一個原始列（例如OP中的A或B），但隨后兩列代表當前月份值和第一個值。 然后我們可以對差異進行平方，並對所有列進行分組以獲得歐幾里德距離，即每個學生月份的均方根/ RMS。

  library(tidyverse)
  df %>% 
    group_by(student) %>% 
    mutate_all(list(first = first)) %>%
    ungroup() %>%
  # gather into long form; make col show variant, col2 show orig column
  gather(col, val, -c(student, month, month_first)) %>%
  mutate(col2 = col %>% str_remove("_first")) %>% 
  mutate(col = if_else(col %>% str_ends("_first"),
                        "first",
                        "comparison")) %>% 
  spread(col, val) %>% 
  mutate(square_dif = (comparison - first)^2) %>%
  group_by(student, month) %>%
  summarize(RMS = sqrt(sum(square_dif)))

# A tibble: 6 x 3
# Groups:   student [2]
  student month   RMS
  <fct>   <int> <dbl>
1 Amy         1  0   
2 Amy         2  5   
3 Amy         3  3.61
4 Bob         1  0   
5 Bob         2  2.24
6 Bob         3  2.24

方法2.這里，數據的長版本加入到每個學生最早的月份版本。

library(tidyverse)
df_long <- gather(df, col, val, -c(month, student))
df_long %>% left_join(df_long %>% 
              group_by(student) %>%
              top_n(-1, wt = month) %>%
              rename(first_val = val) %>% 
              select(-month),
            by = c("student", "col")) %>%
  mutate(square_dif = (val - first_val)^2) %>%
  group_by( student, month) %>%
  summarize(RMS = sqrt(sum(square_dif)))

# A tibble: 6 x 3
# Groups:   student [2]
  student month   RMS
  <fct>   <int> <dbl>
1 Amy         1  0   
2 Amy         2  5   
3 Amy         3  3.61
4 Bob         1  0   
5 Bob         2  2.24
6 Bob         3  2.24

Answer 3

而不是mutate_all調用，直接計算dist_from_first更容易。 我唯一不清楚的是月份是否應該包含在group_by()語句中。

library(tidyverse)

df <- tibble(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 6, 6, 8, 6, 9),
                 B=c(6, 2, 8, 5, 6, 7))

df%>%
  group_by(student)%>%
  mutate(dist_from_first = sqrt((A - first(A))^2 + (B - first(B))^2))%>%
  ungroup()

# A tibble: 6 x 5
#  month student     A     B dist_from_first
#  <int> <chr>   <dbl> <dbl>           <dbl>
#1     1 Amy         9     6            0   
#2     2 Amy         6     2            5   
#3     3 Amy         6     8            3.61
#4     1 Bob         8     5            0   
#5     2 Bob         6     6            2.24
#6     3 Bob         9     7            2.24

R：計算分組數據幀的第一行和當前行之間的距離

問題描述

3 個解決方案

解決方案1
2 2019-04-28 14:48:40

解決方案2
1 已采納 2019-04-28 04:56:44

解決方案3
0 2019-04-28 02:13:43

R：計算分組數據幀的第一行和當前行之間的距離

問題描述

3 個解決方案

解決方案1 2 2019-04-28 14:48:40

解決方案2 1 已采納 2019-04-28 04:56:44

解決方案3 0 2019-04-28 02:13:43

解決方案1
2 2019-04-28 14:48:40

解決方案2
1 已采納 2019-04-28 04:56:44

解決方案3
0 2019-04-28 02:13:43