简体   繁体   中英

R converting continuous variable to categorical

I have a column of continuous numeric values (NO2) which I need to convert into categorical values. Can someone explain how the following code accomplishes that:

cutpoints <- quantile(dataframe%NO2, seq(0,1,length=4),na.rm=TRUE)  
dataframe%newcol <- cut(dataframe%NO2, cutpoints)  
levels(dataframe%newcols) returns (0.3781,1.2] (1.2,1.42] (1.42,2.55]  

I think you meant to use $ instead of % to refer column names.

If you run the code step-by-step it will help you to understand.

seq creates a sequence from 0 to 1 with a length of 4.

seq(0,1,length=4)
#[1] 0.000 0.333 0.667 1.000

quantile breaks the vector into quantiles of data with a given probability (here seq(0,1,length=4) ).

set.seed(123)
x <- runif(10)
cutpoints <- quantile(x, seq(0,1,length=4),na.rm=TRUE) 
#    0%  33.3%  66.7%   100% 
#0.0456 0.4566 0.7883 0.9405 

and now these breaks are used to cut the data.

cut(x, cutpoints)

meaning we divide x into different groups where cutpoints[1]-cutpoints[2] is one group, cutpoints[2]-cutpoints[3] another group and so on.

You can also use findInterval instead of cut .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM