簡體   English   中英

基於團隊名稱的團隊成員R數

[英]R count number of Team members based on Team name

我有一個df,其中每一行代表一個人,每一列代表這些人的特征。 列之一是TeamName,這是個人所屬的團隊的名稱。 多個人屬於一個團隊。

我想在R中使用一個函數來創建一個新列,其中包含每個團隊的團隊成員數量。

因此,例如,我有:

df
Name    Surname     TeamName
John     Smith      Champions
Mary     Osborne    Socceroos
Mark     Johnson    Champions
Rory     Bradon     Champions
Jane     Bryant     Socceroos
Bruce    Harper     

我想要

df1
Name    Surname     TeamName    TeamNo
John     Smith      Champions     3
Mary     Osborne    Socceroos     2
Mark     Johnson    Champions     3
Rory     Bradon     Champions     3  
Jane     Bryant     Socceroos     2
Bruce    Harper                   0

因此,您可以看到計數也包括該個人,並且如果某人(例如Bruce Harper)沒有團隊名稱,那么他將獲得0。

我怎樣才能做到這一點? 謝謝!

這是一個基於使用data.table的解決方案,它可能對您的需求來說實在是太多了,但是在這里:

library(data.table)
dt=data.table(df)
# First, let's convert the factors of TeamName, to characters
dt[,TeamName:=as.character(TeamName)]
# Now, let find all the team numbers
dt[,TeamNo:=.N, by='TeamName']
# Let's exclude the special cases
dt[is.na(TeamName),TeamNo:=NA]
dt[TeamName=="",TeamNo:=NA]

顯然這不是最佳解決方案,但我希望這會有所幫助

如果您需要基於“ TeamName”列了解前兩列中的unique成員數,則一個選項是n_distinctdplyr

 library(dplyr)
 library(tidyr)
 df %>%
     unite(Var, Name, Surname) %>% #paste the columns together
      group_by(TeamName) %>% #group by TeamName
      mutate(TeamNo= n_distinct(Var)) %>% #create the TeamNo column
      separate(Var, into=c('Name', 'Surname')) #split the 'Var' column

或者,如果只是每個“ TeamName”的行數,我們可以按“ TeamName”進行分組,使用n()獲取每個組的行數,並根據該n()創建帶有mutate的“ TeamNo”列,如果需要一個ifelse條件可以被用來給NA關於“TeamName”被''NA

df %>%
   group_by(TeamName) %>%
   mutate(TeamNo = ifelse(is.na(TeamName)|TeamName=='', NA_integer_, n())) 
#   Name Surname  TeamName TeamNo
#1  John   Smith Champions      3
#2  Mary Osborne Socceroos      2
#3  Mark Johnson Champions      3
#4  Rory  Bradon Champions      3
#5  Jane  Bryant Socceroos      2
#6 Bruce  Harper                NA

或者,您可以使用以base Rbase R ave 假設如果有''NA ,我首先將''轉換為NA ,然后使用ave來獲取按該列分組的“ TeamNo”的length 它將為NA給出NA 例如。

  v1 <- c(df$TeamName, NA)# appending an NA with the example to show the case
  is.na(v1) <- v1=='' #convert the `'' to `NA`
  as.numeric(ave(v1, v1, FUN=length))
  #[1]  3  2  3  3  2 NA NA

使用sqldf

library(sqldf)
sqldf("SELECT Name, Surname, TeamName, n
      FROM df 
      LEFT JOIN
      (SELECT TeamName, COUNT(Name) AS n 
      FROM df 
      WHERE NOT TeamName IS '' GROUP BY TeamName)
      USING (TeamName)")

輸出:

   Name Surname  TeamName  n
1  John   Smith Champions  3
2  Mary Osborne Socceroos  2
3  Mark Johnson Champions  3
4  Rory  Bradon Champions  3
5  Jane  Bryant Socceroos  2
6 Bruce  Harper           NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM