简体   繁体   中英

subsetting data frame with selected factors

After dabbling around for a some time now my mind is a bit scattered about a procedure of sub setting a data frame, wnd , which has a variable column ORIGIN (class:factor).

a = sort(table(wnd$ORIGIN), decreasing=T)[1:20]
a

ATL    ORD    DFW    DEN    LAX    IAH    PHX    SFO    CLT..
123915  94422  90184  70970  69298  58850  57316  52702  44234..

# a is a table 20 factors of interest (highest volume).

b = names(a) 
b
[1] "ATL" "ORD" "DFW" "DEN" "LAX" "IAH" "PHX" "SFO" "CLT" "LAS" "DTW" "EWR" "MSP"
[14] "MCO" "SLC" "JFK" "BOS" "BWI" "LGA" "SEA"
#b pulls out the names of the airport i require in my subset

Then I would like to create a new data frame with only these factors in b in it(ie subsetting). For one they are not of same class:

> class(b)
[1] "character"

> class(wnd$ORIGIN)
[1] "factor

I tried few different things( as.factor(b) , wnd$ORIGIN==b , etc) but now my confusion is growing and could like someone to explain what is the correct way(s) of thinking about this.

data.frame turns character strings into factors by default.

data.frame(origin=b, count=unname(a))
  origin count
1    DFW     8
2    ATL     6
3    ORD     3

unname removes the name attribute from a because of the output of table .

Data

set.seed(111)
a <- c("ATL", "ORD", "DFW", "DEN", "LAX")
wnd <- data.frame(ORIGIN=sample(x,20,T))
a <- sort(table(wnd$ORIGIN), decreasing=T)[1:3]
b <- names(a)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM