After dabbling around for a some time now my mind is a bit scattered about a procedure of sub setting a data frame, wnd
, which has a variable column ORIGIN
(class:factor).
a = sort(table(wnd$ORIGIN), decreasing=T)[1:20]
a
ATL ORD DFW DEN LAX IAH PHX SFO CLT..
123915 94422 90184 70970 69298 58850 57316 52702 44234..
# a is a table 20 factors of interest (highest volume).
b = names(a)
b
[1] "ATL" "ORD" "DFW" "DEN" "LAX" "IAH" "PHX" "SFO" "CLT" "LAS" "DTW" "EWR" "MSP"
[14] "MCO" "SLC" "JFK" "BOS" "BWI" "LGA" "SEA"
#b pulls out the names of the airport i require in my subset
Then I would like to create a new data frame with only these factors in b
in it(ie subsetting). For one they are not of same class:
> class(b)
[1] "character"
> class(wnd$ORIGIN)
[1] "factor
I tried few different things( as.factor(b)
, wnd$ORIGIN==b
, etc) but now my confusion is growing and could like someone to explain what is the correct way(s) of thinking about this.
data.frame
turns character strings into factors by default.
data.frame(origin=b, count=unname(a))
origin count
1 DFW 8
2 ATL 6
3 ORD 3
unname
removes the name attribute from a
because of the output of table
.
Data
set.seed(111)
a <- c("ATL", "ORD", "DFW", "DEN", "LAX")
wnd <- data.frame(ORIGIN=sample(x,20,T))
a <- sort(table(wnd$ORIGIN), decreasing=T)[1:3]
b <- names(a)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.