简体   繁体   中英

Complete column names with another dataframe column in R

I have this table:

library(rvest)
library(tidyverse)
tables_team_pl <- read_html('https://www.win-or-lose.com/football-team-colours/')
color_table <- tables_team_pl %>% html_table() %>% pluck(1) %>% select(-Away)

and also this one:

table_1 <- structure(list(Team = c("Arsenal", "Aston Villa", "Blackburn", 
"Bolton", "Chelsea", "Everton", "Fulham", "Liverpool", "Manchester City", 
"Manchester Utd", "Newcastle Utd", "Norwich City", "QPR", "Stoke City", 
"Sunderland", "Swansea City", "Tottenham", "West Brom", "Wigan Athletic", 
"Wolves")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-20L))

As you can see the second table has its names incomplete. for example, Manchester Utd should be Manchester United as in the first table.

So, all I need is to complete this second table extracting the same names from the first table.

So, I will have table_1 corrected : Manchester Utd should change to Manchester Unites, Blackburn should change to Blackburn Rovers and so on. The complete names should come from the first table .

Also on the second table I have QPR, whcih should be "Queens Park Rangers".

Any help?

We may use a strindist join

library(fuzzyjoin)
library(dplyr)
stringdist_left_join(table_1, color_table, by = "Team", method = "soundex") %>%
     transmute(Team = coalesce(Team.y, Team.x)) %>%
     distinct

Here's a base R solution using agrep . It has the neat feature of allowing to set a maximum number of insertions , deletions and substitutions to achieve a match.

table_1_original <- table_1

table_1$Team <- data.frame( Team=sapply( as.matrix(table_1), function(x){
                       a=agrep( x, tables_team_pl,
                       max=list(insert=0,del=0,subs=3));
                       if(!identical(a, integer(0))){ tables_team_pl[a] }
                       else{ x } } ) )

Result including a comparison to the original:

cbind(table_1_original, table_1)
              Team                    Team
1          Arsenal                 Arsenal
2      Aston Villa             Aston Villa
3        Blackburn        Blackburn Rovers
4           Bolton                  Bolton
5          Chelsea                 Chelsea
6          Everton                 Everton
7           Fulham                  Fulham
8        Liverpool               Liverpool
9  Manchester City         Manchester City
10  Manchester Utd       Manchester United
11   Newcastle Utd        Newcastle United
12    Norwich City            Norwich City
13          Queens     Queens Park Rangers
14      Stoke City              Stoke City
15      Sunderland              Sunderland
16    Swansea City            Swansea City
17       Tottenham       Tottenham Hotspur
18       West Brom    West Bromwich Albion
19  Wigan Athletic          Wigan Athletic
20          Wolves Wolverhampton Wanderers

filtered HTML data without colors:

tables_team_pl <- c("Aberdeen", "AFC Bournemouth", "AFC Wimbledon", "Arsenal", 
"Aston Villa", "Birmingham City", "Blackburn Rovers", "Bradford City", 
"Brentford", "Brighton & Hove Albion", "Bristol City", "Burnley", 
"Cardiff City", "Celtic", "Chelsea", "Crystal Palace", "Derby County", 
"Dundee", "Dundee United", "Everton", "Fulham", "Hamilton Academical", 
"Heart of Midlothian", "Hibernian", "Huddersfield Town", "Hull City", 
"Inverness Caledonian Thistle", "Kilmarnock", "Leeds United", 
"Leicester City", "Liverpool", "Livingston", "Manchester City", 
"Manchester United", "Middlesbrough", "Millwall", "Motherwell", 
"Newcastle United", "Norwich City", "Nottingham Forest", "Partick Thistle", 
"Portsmouth", "Preston North End", "Queens Park Rangers", "Rangers", 
"Reading", "Ross County", "Rotherham", "Sheffield United", "Sheffield Wednesday", 
"Southampton", "St Johnstone", "St Mirren", "Stoke City", "Sunderland", 
"Swansea", "Tottenham Hotspur", "Watford", "West Bromwich Albion", 
"West Ham United", "Wolverhampton Wanderers", "Wycombe Wanderers")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM