简体   繁体   中英

Change continuous variable into categorical

I have this year variable, and I want to change it into a categorical variable with 3 levels. I use the levels function here, which is really painful.

traintest$YearBuilt <- as.factor(traintest$YearBuilt)
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1872,1875,1879,1880,1882,
                                                             1885,1890,1892,1893,1895,
                                                             1896,1898,1900,1901,1902,
                                                             1904,1905,1906,1907,1908,
                                                             1910,1911,1912,1913,1914,
                                                             1915,1916,1917,1918,1919,
                                                             1920,1921,1922,1923,1924,
                                                             1925,1926,1927,1928,1929,
                                                             1930,1931,1932,1934,1935,
                                                             1936,1937,1938,1939,1940,
                                                             1941,1942,1945,1946,1947,
                                                             1948,1949)] <- "Before1950"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1950,1951,1952,1953,1954,
                                                             1955,1956,1957,1958,1959,
                                                             1960,1961,1962,1963,1964,
                                                             1965,1966,1967,1968,1969,
                                                             1970,1971,1972,1973,1974,
                                                             1975,1976,1977,1978,1979,
                                                             1980,1981,1982,1983,1984,
                                                             1985,1986,1987,1988,1989,
                                                             1990,1991,1992,1993,1994,
                                                             1995,1996,1997,1998,1999)] <- "Between1950-2000"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(2000,2001,2002,2003,2004,
                                                             2005,2006,2007,2008,2009,
                                                             2010)] <- "After2000"

I tried using the cut function, but it didn't quite work for me, it basically took all the variables into the first category, and the other two categories became zeros.

Is there any easier method I can do this?

One option is to create a logical vector

v1 <- as.numeric(levels(traintest$YearBuilt))
i1 <- v1  < 1950
i2 <- !i1 & v1 < 2000
i3 <- v1 >=2000
levels(traintest$YearBuilt)[i1] <- "Before 1950"
levels(traintest$YearBuilt)[i2] <- "Between1950-2000"
levels(traintest$YearBuilt)[i3] <- "After 2000"

Or use cut

levels(traintest$YearBuilt) <- cut(v1, breaks = c(-Inf, 1949, 1999, 
       Inf), labels = c("Before1950", "Between1950-2000", "After 2000"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM