簡體   English   中英

通過匹配另一個數據幀中的列來子集數據幀

[英]Subset a data frame by matching columns from another data frame

我嘗試了多種方法來通過與另一個數據幀或向量進行匹配來子集數據幀(IMD15)。 我敢肯定這很簡單! 我在發布前已經找了幾個小時,但是我對RI還是很陌生,似乎找不到解決方法:(

## Import the English Indices of Deprivation 2015 - LSOA Level
IMD15 <- read.csv(url("https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/467774/File_7_ID_2015_All_ranks__deciles_and_scores_for_the_Indices_of_Deprivation__and_population_denominators.csv"))
LSOA2011 <- read.csv("LSOA2011.csv")

LSOA2011.csv是我要用來查詢IMD15數據的20個LSOA代碼(請參見下文)。

(“ E01019602”,“ E01019614”,“ E01019615”,“ E01019631”,“ E01019600”,“ E01019604”,“ E01019606”,“ E01019618”,“ E01019599”,“ E01019601”,“ E01019613”,“ E01019617”,“ E01019632”,“ E01019635”,“ E01019636”,“ E01019611”,“ E01019597”,“ E01019737”,“ E01029801”,“ E01029817”)

## Format the column names
names(IMD15) [1] <- "LSOA code (2011)"
names(IMD15) [2] <- "LSOA name (2011)"
names(IMD15) [3] <- "Local Authority District code (2013)"
names(IMD15) [4] <- "Local Authority District name (2013)"
names(IMD15) [5] <- "Index of Multiple Deprivation (IMD) Score"
names(IMD15) [6] <- "Index of Multiple Deprivation (IMD) Rank (where 1 is most deprived)"
names(IMD15) [7] <- "Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs)"

## Subset the data columns
IMD15 <- IMD15[ , 1:7] 


> str(IMD15)
'data.frame':   32844 obs. of  7 variables:
 $ LSOA code (2011)                                                                  : Factor w/ 32844 levels "E01000001","E01000002",..: 30557 30558 30559 30560 30578 30582 30546 30547 30548 30573 ...
 $ LSOA name (2011)                                                                  : Factor w/ 32844 levels "Adur 001A","Adur 001B",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Local Authority District code (2013)                                              : Factor w/ 326 levels "E06000001","E06000002",..: 241 241 241 241 241 241 241 241 241 241 ...
 $ Local Authority District name (2013)                                              : Factor w/ 326 levels "Adur","Allerdale",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Index of Multiple Deprivation (IMD) Score                                         : num  12.4 28.6 11.7 16.4 18.3 ...
 $ Index of Multiple Deprivation (IMD) Rank (where 1 is most deprived)               : int  21352 8864 22143 17252 15643 21176 27934 28249 27569 18352 ...
 $ Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs): int  7 3 7 6 5 7 9 9 9 6 ...

> str(LSOA2011)
'data.frame':   20 obs. of  1 variable:
 $ LSOA2011: Factor w/ 20 levels "E01019597","E01019599",..: 5 10 11 14 3 6 7 13 2 4 ...

當我嘗試方法(a)

IMD15a <- IMD15[c("E01019602", "E01019614", "E01019615", "E01019631",   "E01019600", "E01019604", "E01019606",
             "E01019618", "E01019599", "E01019601", "E01019613", "E01019617", "E01019632", "E01019635", 
             "E01019636", "E01019611", "E01019597", "E01019737",    "E01029801", "E01029817"), ]

結果是NA值(20個obs,包含7個變量)

> str(IMD15a)
'data.frame':   20 obs. of  7 variables:
 $ LSOA code (2011)                                                                  : Factor w/ 32844 levels "E01000001","E01000002",..: NA NA NA NA NA NA NA NA NA NA ...
 $ LSOA name (2011)                                                                  : Factor w/ 32844 levels "Adur 001A","Adur 001B",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Local Authority District code (2013)                                              : Factor w/ 326 levels "E06000001","E06000002",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Local Authority District name (2013)                                              : Factor w/ 326 levels "Adur","Allerdale",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Index of Multiple Deprivation (IMD) Score                                         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Index of Multiple Deprivation (IMD) Rank (where 1 is most deprived)               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs): int  NA NA NA NA NA NA NA NA NA NA ...

或方法(b)

IMD15b <- cbind(IMD15[match(names(LSOA2011), IMD15$`LSOA code (2011)` ),], LSOA2011)

這也會導致NA值(8個變量的20個觀測值)

> str(IMD15b)
'data.frame':   20 obs. of  8 variables:
 $ LSOA code (2011)                                                                  : Factor w/ 32844 levels "E01000001","E01000002",..: NA NA NA NA NA NA NA NA NA NA ...
 $ LSOA name (2011)                                                                  : Factor w/ 32844 levels "Adur 001A","Adur 001B",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Local Authority District code (2013)                                              : Factor w/ 326 levels "E06000001","E06000002",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Local Authority District name (2013)                                              : Factor w/ 326 levels "Adur","Allerdale",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Index of Multiple Deprivation (IMD) Score                                         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Index of Multiple Deprivation (IMD) Rank (where 1 is most deprived)               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs): int  NA NA NA NA NA NA NA NA NA NA ...
 $ LSOA2011                                                                          : Factor w/ 20 levels "E01019597","E01019599",..: 5 10 11 14 3 6 7 13 2 4 ...

嘗試這個:

LSOA <- ("E01019602", "E01019614", "E01019615", "E01019631", "E01019600", "E01019604", "E01019606", "E01019618", "E01019599", "E01019601", "E01019613", "E01019617", "E01019632", "E01019635", "E01019636", "E01019611", "E01019597", "E01019737", "E01029801", "E01029817")

df <- IMD15[IMD15[['LSOA code (2011)']] %in% LSOA, ]

str(df)

'data.frame':   20 obs. of  7 variables:
 $ LSOA code (2011)                                                                  : Factor w/ 32844 levels "E01000001","E01000002",..: 19057 19069 19070 19086 19055 19059 19061 19073 19054 19056 ...
 $ LSOA name (2011)                                                                  : Factor w/ 32844 levels "Adur 001A","Adur 001B",..: 8566 8567 8568 8569 8570 8571 8572 8573 8574 8575 ...
 $ Local Authority District code (2013)                                              : Factor w/ 326 levels "E06000001","E06000002",..: 75 75 75 75 75 75 75 75 75 75 ...
 $ Local Authority District name (2013)                                              : Factor w/ 326 levels "Adur","Allerdale",..: 77 77 77 77 77 77 77 77 77 77 ...
 $ Index of Multiple Deprivation (IMD) Score                                         : num  10.19 6.55 7.83 8.22 8.62 ...
 $ Index of Multiple Deprivation (IMD) Rank (where 1 is most deprived)               : int  23963 28526 26942 26470 25943 30038 31052 24720 28159 20152 ...
 $ Index of Multiple Deprivation (IMD) Decile (where 1 is most deprived 10% of LSOAs): int  8 9 9 9 8 10 10 8 9 7 ...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM