简体   繁体   中英

How do I translate this subset code: carriers[carriers[, 1] == “NW”, ]

carriers is a dataframe with 1491 observations of/with two variables

>str(carriers)

'data.frame':   1491 obs. of  2 variables:
 $ Code       : Factor w/ 1490 levels "02Q","04Q","05Q",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Description: Factor w/ 1491 levels "40-Mile Air",..: 1328 1331 479 887 620 1296 523 12 876 752 ...

Then we pull out the factor corresponding to Description: Northwest Airlines Inc., which corresponds to factor NW for variable Code using:

> carriers[carriers[,1] == "NW", ]
    Code             Description
NA  <NA>                    <NA>
921   NW Northwest Airlines Inc.

Just when I thought I have a good grasp of subsetting, I couldn't translate this simple code. I know what happened just unclear with [carriers[,1]=="NW", ] .

Note:

> carriers[921,2]
[1] Northwest Airlines Inc.
1491 Levels: 40-Mile Air A/S Conair AAA-Action Air Carrier Inc. ... Zuliana De Aviacion

How is carriers[carriers[,1] == "NW", ] saying: give me the 2nd column of the row in dataframe carriers, if the 1st column is "NW". First part, does it say: all the rows for whom (1st) columns equals "NW"? Then on the RHS of why the , after "NW".

I guess you have NA values in the carriers[,1] column which cause the extra NA row by subsetting. Try by adding the condition & !is.na(carriers[,1]) .

carriers[carriers[,1] == "NW" & !is.na(carriers[,1]), ]

Using a reproducible example

carriers <- data.frame(Code=c('NW', NA, 'SW'), 
   Description = c('Northwest Airlines Inc.', '', 'Southwest Airlines Inc.'))
 carriers[carriers[,1] == "NW", ]
 #   Code             Description
 #1    NW Northwest Airlines Inc.
 #NA <NA>                    <NA>

By using the corrected condition

 carriers[carriers[,1] == "NW" & !is.na(carriers[,1]), ]
 #  Code             Description
 #1   NW Northwest Airlines Inc.

Why we are getting an NA row?

We can check the output of logical condition

 carriers[,1] == "NW"
 #[1]  TRUE    NA FALSE

If there is any NA value, it returns NA instead of TRUE/FALSE . During subsetting we get the rows corresponding to TRUE values from the condition above and in addition a NA row is created for the NA return value.

The remedy would be to look for values that are 'NW' and is not an NA .

 carriers[,1] == "NW" & !is.na(carriers[,1])
 #[1]  TRUE FALSE FALSE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM