I have been following the tutorial on DataCamp . I have the following line of code, that when I run it produces a different value for "drows"
hflights %>%
group_by(UniqueCarrier, Dest) %>%
summarise(rows= n(), drows = n_distinct(rows))
First time:
Source: local data frame [234 x 4]
Groups: UniqueCarrier [?]
UniqueCarrier Dest rows drows
<chr> <chr> <int> <int>
1 AirTran ATL 211 86
2 AirTran BKG 14 6
3 Alaska SEA 32 18
4 American DFW 186 74
5 American MIA 129 57
6 American_Eagle DFW 234 101
7 American_Eagle LAX 74 34
8 American_Eagle ORD 133 56
9 Atlantic_Southeast ATL 64 28
10 Atlantic_Southeast CVG 1 1
# ... with 224 more rows
Second time:
Source: local data frame [234 x 4]
Groups: UniqueCarrier [?]
UniqueCarrier Dest rows drows
<chr> <chr> <int> <int>
1 AirTran ATL 211 125
2 AirTran BKG 14 13
3 Alaska SEA 32 29
4 American DFW 186 118
5 American MIA 129 76
6 American_Eagle DFW 234 143
7 American_Eagle LAX 74 47
8 American_Eagle ORD 133 85
9 Atlantic_Southeast ATL 64 44
10 Atlantic_Southeast CVG 1 1
# ... with 224 more rows
Third time:
Source: local data frame [234 x 4]
Groups: UniqueCarrier [?]
UniqueCarrier Dest rows drows
<chr> <chr> <int> <int>
1 AirTran ATL 211 88
2 AirTran BKG 14 7
3 Alaska SEA 32 16
4 American DFW 186 79
5 American MIA 129 61
6 American_Eagle DFW 234 95
7 American_Eagle LAX 74 31
8 American_Eagle ORD 133 67
9 Atlantic_Southeast ATL 64 31
10 Atlantic_Southeast CVG 1 1
# ... with 224 more rows
My question is why does this value constantly change? What is it doing?
Apparently this is normal behaviour, see this issue here. https://github.com/tidyverse/dplyr/issues/2222 .
This is because values in list columns are compared by reference, so n_distinct() treats them as different unless they really point to the same object:
So the internal storage of the df changes the way the thing works. Hadley's comment in that issue seems to say it might be a bug (in the sense of unwanted behaviour), or it might be expected behaviour they need to document better.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.