简体   繁体   中英

How do I collect all the different names in a column from a data frame, and count the occurrences of each name?

I have a data frame that contains 6181 rows (one for each player in a large fantasy football contest), one of the columns in this data frame has a list of the 9 different football players that make up each player's roster.

I want R to give me all the different football player names that show up in this column (there's hundreds) and count how many times each of these individual names shows up.

Here is an example of a cell in the column:

QB Dane Evans QB Jaquez Johnson RB Zack Langer RB Greg Howell WR Keyarris Garrett WR Jenson Stoshak WR Keevan Lucas FLEX Jordan Howard FLEX Sony Michel

For this I would like the output of (if I were just working with 1 row instead of 6181):

QB Dane Evans - 1

QB Jaquez Johnson - 1 

RB Zack Langer - 1

RB Greg Howell - 1

WR Keyarris Garrett - 1 

WR Jenson Stoshak - 1

WR Keevan Lucas - 1

FLEX Jordan Howard - 1

FLEX Sony Michel - 1

Or 100% instead of 1. 

Most of my searches for answers to this question, I think, seem to be showing me ways that I could count how many times any specific combination of 9 players, listed in a specific order, is showing up, not counts of individual names across all rows.

My humble Solution

# Data Frame
my.players <- data.frame( name = "QB Dane Evans QB Jaquez Johnson RB Zack Langer RB Greg Howell WR Keyarris Garrett WR Jenson Stoshak WR Keevan Lucas FLEX Jordan Howard FLEX Sony Miche")

# Position dictionary. Add all positions here in that format.
pos.dic    <- c( "\ *QB\ *"
               , "\ *RB\ *"
               , "\ *WR\ *"
               , "\ *FLEX\ *"
               )

# Regex for positions
pos.regex <- paste( pos.dic, collapse = "|" )

# Remove Positions
play.names <- gsub( pattern     = pos.regex
                  , replacement = ","
                  , x           = my.players$name
                  )


# Split
play.names <- strsplit( x = play.names, split = ",") 

# Unlist
play.names <- unlist( x = play.names )

# Remove first space
play.names <- play.names[ -1 ]

# Result
[1] "Dane Evans"       "Jaquez Johnson"   "Zack Langer"      "Greg Howell"      "Keyarris Garrett" "Jenson Stoshak"   "Keevan Lucas"     "Jordan Howard"   
[9] "Sony Miche"    

Then, make use of the table function, it will return a frequency table. Description:

 ‘table’ uses the cross-classifying factors to build a contingency
     table of the counts at each combination of factor levels.

Example:

freq.table <- table(x = play.names )    
  Dane Evans      Greg Howell   Jaquez Johnson   Jenson Stoshak    Jordan Howard     Keevan Lucas Keyarris Garrett       Sony Miche      Zack Langer 
           1                1                1                1                1                1                1                1                1 

Then, if you prefer percentages, use prop.table :) :

prop.table <- prop.table( x = freq.table )

prop.table <- round( x      = prop.table * 100
                   , digits = 2
                   )

Dane Evans      Greg Howell   Jaquez Johnson   Jenson Stoshak    Jordan Howard     Keevan Lucas Keyarris Garrett       Sony Miche      Zack Langer 
           11.11            11.11            11.11            11.11            11.11            11.11            11.11            11.11            11.11 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM