對於一個因子的所有級別，請使用dplyr從同一數據幀返回另一個因子的所有級別。 [R

Question

我有一個非常大的數據集，其中包含歷史足球成績。 這是其中的一部分：

  Season              home          visitor  FT
    1954       Aston Villa              SHW 0-0
    1956       Aston Villa              SHW 5-0
    1957       Aston Villa              SHW 2-0
    1960       Aston Villa              SHW 4-1
    1987       Aston Villa              HUL 5-0
    1987       Aston Villa              HUD 1-1
    1987       Aston Villa              BLB 1-1
    1933 Preston North End              NOT 4-0
    1958 Preston North End              NOT 3-5
    1960 Preston North End              NOT 0-1
    1962 Preston North End              SWA 6-3
    1976           Walsall              SHW 5-1
    1977           Walsall              SHW 1-1
    2002           Walsall Sheffield United 0-1
    2002           Walsall       Gillingham 1-0

對於每個主隊（因素），我希望返回該因素發生的另一個因素（季節）的唯一水平。 在上面的示例中，它將返回：

Aston Villa - 1954, 1956, 1957, 1960, 1987
Preston North End - 1933, 1958, 1960, 1962
Walsall - 1976, 1977, 2002

我考慮過要嘗試在dplyr中執行此操作。 但是，我做錯了。

我嘗試了這個：

library(dplyr)
demodf%>%
group_by(home)%>%
summarize(levels(Season))
#Error: expecting a single value

出於興趣，我做了以下事情，看看是否可以看到每個因素/主隊的第一年回報：

demodf%>%
group_by(home)%>%
summarize(levels(Season)[1])

這給了我這個：

#               home levels(Season)[1]
#1       Aston Villa              1933
#2 Preston North End              1933
#3           Walsall              1933

這是不對的-它剛剛返回了整個數據幀（1933）中第一季度的季節因子，而不是分別返回每個團隊的第一年/季節因子的水平-我認為group.by會幫助獲得在這。

我對此表示感謝。

下面應該使您能夠復制上表：

demodf<-structure(list(Season = structure(c(2L, 3L, 4L, 6L, 10L, 10L, 
10L, 1L, 5L, 6L, 7L, 8L, 9L, 11L, 11L), .Label = c("1933", "1954", 
"1956", "1957", "1958", "1960", "1962", "1976", "1977", "1987", 
"2002"), class = "factor"), home = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("Aston Villa", 
"Preston North End", "Walsall"), class = "factor"), visitor = structure(c(7L, 
7L, 7L, 7L, 4L, 3L, 1L, 5L, 5L, 5L, 8L, 7L, 7L, 6L, 2L), .Label = c("BLB", 
"Gillingham", "HUD", "HUL", "NOT", "Sheffield United", "SHW", 
"SWA"), class = "factor"), FT = structure(c(1L, 9L, 5L, 8L, 9L, 
4L, 4L, 7L, 6L, 2L, 11L, 10L, 4L, 2L, 3L), .Label = c("0-0", 
"0-1", "1-0", "1-1", "2-0", "3-5", "4-0", "4-1", "5-0", "5-1", 
"6-3"), class = "factor")), .Names = c("Season", "home", "visitor", 
"FT"), row.names = c(NA, -15L), class = "data.frame")

Answer 1

在這種情況下，您可以使用by ：

with(demodf, by(Season, home, unique))
# home: Aston Villa
# [1] 1954 1956 1957 1960 1987
# Levels: 1933 1954 1956 1957 1958 1960 1962 1976 1977 1987 2002
# ------------------------------------------------------------ 
# home: Preston North End
# [1] 1933 1958 1960 1962
# Levels: 1933 1954 1956 1957 1958 1960 1962 1976 1977 1987 2002
# ------------------------------------------------------------ 
# home: Walsall
# [1] 1976 1977 2002
# Levels: 1933 1954 1956 1957 1958 1960 1962 1976 1977 1987 2002

“ data.table”包還可以將list s作為data.table列來data.table ，如下所示：

library(data.table)
DT <- as.data.table(demodf)
DT[, list(Season = list(unique(Season))), by = home]
#                 home                   Season
# 1:       Aston Villa 1954,1956,1957,1960,1987
# 2: Preston North End      1933,1958,1960,1962
# 3:           Walsall           1976,1977,2002

注意結果的結構：

str(.Last.value)
# Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
#  $ home  : Factor w/ 3 levels "Aston Villa",..: 1 2 3
#  $ Season:List of 3
#   ..$ : Factor w/ 11 levels "1933","1954",..: 2 3 4 6 10
#   ..$ : Factor w/ 11 levels "1933","1954",..: 1 5 6 7
#   ..$ : Factor w/ 11 levels "1933","1954",..: 8 9 11
#  - attr(*, ".internal.selfref")=<externalptr>

Answer 2

但是，將Season作為因素會使事情變得復雜些

demodf %>% group_by(home) %>% do(data.frame(Seasons = unique(.$Season)))

將工作。

請注意，使用unique而不是levels更簡單

Answer 3

我使用粘貼來模仿您想要的輸出：

demodf%>%
  group_by(home)%>%
  summarise( summary =  paste(unique(Season),collapse=","))

這使

               home                  summary
1       Aston Villa 1954,1956,1957,1960,1987
2 Preston North End      1933,1958,1960,1962
3           Walsall           1976,1977,2002

對於一個因子的所有級別，請使用dplyr從同一數據幀返回另一個因子的所有級別。 [R

問題描述

3 個解決方案

解決方案1
4 已采納 2014-07-29 03:39:49

解決方案2
1 2014-07-29 03:46:49

解決方案3
0 2014-07-29 03:52:38

對於一個因子的所有級別，請使用dplyr從同一數據幀返回另一個因子的所有級別。 [R

問題描述

3 個解決方案

解決方案1 4 已采納 2014-07-29 03:39:49

解決方案2 1 2014-07-29 03:46:49

解決方案3 0 2014-07-29 03:52:38

解決方案1
4 已采納 2014-07-29 03:39:49

解決方案2
1 2014-07-29 03:46:49

解決方案3
0 2014-07-29 03:52:38