I have this year variable, and I want to change it into a categorical variable with 3 levels. I use the levels function here, which is really painful.
traintest$YearBuilt <- as.factor(traintest$YearBuilt)
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1872,1875,1879,1880,1882,
1885,1890,1892,1893,1895,
1896,1898,1900,1901,1902,
1904,1905,1906,1907,1908,
1910,1911,1912,1913,1914,
1915,1916,1917,1918,1919,
1920,1921,1922,1923,1924,
1925,1926,1927,1928,1929,
1930,1931,1932,1934,1935,
1936,1937,1938,1939,1940,
1941,1942,1945,1946,1947,
1948,1949)] <- "Before1950"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1950,1951,1952,1953,1954,
1955,1956,1957,1958,1959,
1960,1961,1962,1963,1964,
1965,1966,1967,1968,1969,
1970,1971,1972,1973,1974,
1975,1976,1977,1978,1979,
1980,1981,1982,1983,1984,
1985,1986,1987,1988,1989,
1990,1991,1992,1993,1994,
1995,1996,1997,1998,1999)] <- "Between1950-2000"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(2000,2001,2002,2003,2004,
2005,2006,2007,2008,2009,
2010)] <- "After2000"
I tried using the cut function, but it didn't quite work for me, it basically took all the variables into the first category, and the other two categories became zeros.
Is there any easier method I can do this?
One option is to create a logical vector
v1 <- as.numeric(levels(traintest$YearBuilt))
i1 <- v1 < 1950
i2 <- !i1 & v1 < 2000
i3 <- v1 >=2000
levels(traintest$YearBuilt)[i1] <- "Before 1950"
levels(traintest$YearBuilt)[i2] <- "Between1950-2000"
levels(traintest$YearBuilt)[i3] <- "After 2000"
Or use cut
levels(traintest$YearBuilt) <- cut(v1, breaks = c(-Inf, 1949, 1999,
Inf), labels = c("Before1950", "Between1950-2000", "After 2000"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.