[英]Match tables using 2 criteria in R
我剛開始在 R 中編碼,我正在嘗試操作數據,但我遇到了以下問題:我有 2 個不同的表(簡化),第一個( player_df
)如下:
name experience Club age Position
luc 2 FCB 18 Goalkeeper
jean 9 Real 26 midfielder
ronaldo 14 FCB 32 Goalkeeper
jean 9 Real 26 midfielder
messi 11 Liverpool 35 midfielder
tevez 6 Chelsea 27 Attack
inzaghi 9 Juve 34 Defender
kwfni 17 Bayern 40 Attack
Blabla 9 Real 25 midfielder
wdfood 11 Liverpool 33 midfielder
player2 7 Chelsea 28 Attack
player3 10 Juve 34 Defender
fgh 17 Bayern 40 Attack
...
第二個表是俱樂部和經驗的薪水(以百萬salary_df
)( salary_df
)
*experience FCB BAYERN Juve Real Chelsea
1 1.5 1.3 1 4 3
2 2.5 2 2.4 5 4
3 3.4 3.1 3.5 6.3 5
4 5 4.5 6.7 9 6
5 7.1 6.9 9 12 7
6 9 8 10 15 10
7 10 9 12 16 15
8 14 12 13 19 16
9 14.5 17 15 20 17
10 15 19 17 23 18
..*
我想在第一個表中為我的數據添加一個新列,比如salary_estimation
,它考慮了2個變量,例如這里的experience
和club
。
例如,對於在“FCB”中打球並且有“2”年經驗的“luc”,輸出應該是“2.5”
在 excel 中它有一個索引/匹配函數,但在 RI 中不知道我應該使用哪個函數。
我應該如何解決這個問題?
一個可能的解決方案是將第一個表(假設它是player_df
)與第二個表的“長格式” salary_df
使用experience
和club
作為鍵。 您可以使用tidyverse
包來做到這tidyverse
。
library(tidyverse)
player_df %>%
mutate(Club = str_to_title(Club)) %>%
left_join(
salary_df %>%
pivot_longer(-experience, names_to = "Club", values_to = "salary_estimation") %>%
mutate(Club = str_to_title(Club)) )
# Joining, by = c("experience", "Club")
# # A tibble: 13 x 6
# name experience Club age Position salary_estimation
# <chr> <dbl> <chr> <dbl> <chr> <dbl>
# 1 luc 2 Fcb 18 Goalkeeper 2.5
# 2 jean 9 Real 26 midfielder 20
# 3 ronaldo 14 Fcb 32 Goalkeeper NA
# 4 jean 9 Real 26 midfielder 20
# 5 messi 11 Liverpool 35 midfielder NA
# 6 tevez 6 Chelsea 27 Attack 10
# 7 inzaghi 9 Juve 34 Defender 15
# 8 kwfni 17 Bayern 40 Attack NA
# 9 Blabla 9 Real 25 midfielder 20
# 10 wdfood 11 Liverpool 33 midfielder NA
# 11 player2 7 Chelsea 28 Attack 15
# 12 player3 10 Juve 34 Defender 17
# 13 fgh 17 Bayern 40 Attack NA
數據:
df1 <- read.table(text = 'name experience Club age Position
luc 2 FCB 18 Goalkeeper
jean 9 Real 26 midfielder
ronaldo 14 FCB 32 Goalkeeper
jean 9 Real 26 midfielder
messi 11 Liverpool 35 midfielder
tevez 6 Chelsea 27 Attack
inzaghi 9 Juve 34 Defender
kwfni 17 Bayern 40 Attack
Blabla 9 Real 25 midfielder
wdfood 11 Liverpool 33 midfielder
player2 7 Chelsea 28 Attack
player3 10 Juve 34 Defender
fgh 17 Bayern 40 Attack', header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = 'experience FCB BAYERN Juve Real Chelsea
1 1.5 1.3 1 4 3
2 2.5 2 2.4 5 4
3 3.4 3.1 3.5 6.3 5
4 5 4.5 6.7 9 6
5 7.1 6.9 9 12 7
6 9 8 10 15 10
7 10 9 12 16 15
8 14 12 13 19 16
9 14.5 17 15 20 17
10 15 19 17 23 18', header = TRUE, stringsAsFactors = FALSE)
代碼:
library('data.table')
setDT(df2)[, Chelsea := as.numeric(Chelsea)]
df2 <- melt(df2, id.vars = "experience", variable.name = "Club", value.name = "Salary" )
df2[df1, on = c("experience", "Club"), nomatch = NA]
輸出:
# experience Club Salary name age Position
# 1: 2 FCB 2.5 luc 18 Goalkeeper
# 2: 9 Real 20.0 jean 26 midfielder
# 3: 14 FCB NA ronaldo 32 Goalkeeper
# 4: 9 Real 20.0 jean 26 midfielder
# 5: 11 Liverpool NA messi 35 midfielder
# 6: 6 Chelsea 10.0 tevez 27 Attack
# 7: 9 Juve 15.0 inzaghi 34 Defender
# 8: 17 Bayern NA kwfni 40 Attack
# 9: 9 Real 20.0 Blabla 25 midfielder
# 10: 11 Liverpool NA wdfood 33 midfielder
# 11: 7 Chelsea 15.0 player2 28 Attack
# 12: 10 Juve 17.0 player3 34 Defender
# 13: 17 Bayern NA fgh 40 Attack
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.