dplyr group_by汇总行数不一致

Question

I have been following the tutorial on DataCamp . 我一直在关注DataCamp上的教程。 I have the following line of code, that when I run it produces a different value for "drows" 我有以下代码行，当我运行它时，它会产生不同的“卓尔”值

hflights %>% 
group_by(UniqueCarrier, Dest) %>% 
summarise(rows= n(), drows = n_distinct(rows))

First time: 第一次：

Source: local data frame [234 x 4]
Groups: UniqueCarrier [?]

        UniqueCarrier  Dest  rows drows
                <chr> <chr> <int> <int>
1             AirTran   ATL   211    86
2             AirTran   BKG    14     6
3              Alaska   SEA    32    18
4            American   DFW   186    74
5            American   MIA   129    57
6      American_Eagle   DFW   234   101
7      American_Eagle   LAX    74    34
8      American_Eagle   ORD   133    56
9  Atlantic_Southeast   ATL    64    28
10 Atlantic_Southeast   CVG     1     1
# ... with 224 more rows

Second time: 第二次：

   Source: local data frame [234 x 4]
Groups: UniqueCarrier [?]

        UniqueCarrier  Dest  rows drows
                <chr> <chr> <int> <int>
1             AirTran   ATL   211   125
2             AirTran   BKG    14    13
3              Alaska   SEA    32    29
4            American   DFW   186   118
5            American   MIA   129    76
6      American_Eagle   DFW   234   143
7      American_Eagle   LAX    74    47
8      American_Eagle   ORD   133    85
9  Atlantic_Southeast   ATL    64    44
10 Atlantic_Southeast   CVG     1     1
# ... with 224 more rows

Third time: 第三次：

Source: local data frame [234 x 4]
Groups: UniqueCarrier [?]

        UniqueCarrier  Dest  rows drows
                <chr> <chr> <int> <int>
1             AirTran   ATL   211    88
2             AirTran   BKG    14     7
3              Alaska   SEA    32    16
4            American   DFW   186    79
5            American   MIA   129    61
6      American_Eagle   DFW   234    95
7      American_Eagle   LAX    74    31
8      American_Eagle   ORD   133    67
9  Atlantic_Southeast   ATL    64    31
10 Atlantic_Southeast   CVG     1     1
# ... with 224 more rows

My question is why does this value constantly change? 我的问题是为什么这个价值会不断变化？ What is it doing? 到底在做什么

Answer 1

Apparently this is normal behaviour, see this issue here. 显然，这是正常现象，请在此处查看此问题。 https://github.com/tidyverse/dplyr/issues/2222 . https://github.com/tidyverse/dplyr/issues/2222

This is because values in list columns are compared by reference, so n_distinct() treats them as different unless they really point to the same object: 这是因为列表列中的值是按引用进行比较的，所以n_distinct（）会将它们视为不同，除非它们确实指向同一对象：

So the internal storage of the df changes the way the thing works. 因此，df的内部存储改变了事物的工作方式。 Hadley's comment in that issue seems to say it might be a bug (in the sense of unwanted behaviour), or it might be expected behaviour they need to document better. 哈德利（Hadley）在该问题上的评论似乎表明，这可能是一个错误（就不良行为而言），或者可能是他们需要更好地记录下来的预期行为。

dplyr group_by汇总行数不一致

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-02-14 02:04:44

dplyr group_by汇总行数不一致

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-02-14 02:04:44

解决方案1
2 已采纳 2018-02-14 02:04:44