簡體   English   中英

將連續變量更改為分類變量

[英]Change continuous variable into categorical

我有今年的變量,我想把它改成一個有 3 個級別的分類變量。 我這里用的是levels函數,真的很痛苦。

traintest$YearBuilt <- as.factor(traintest$YearBuilt)
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1872,1875,1879,1880,1882,
                                                             1885,1890,1892,1893,1895,
                                                             1896,1898,1900,1901,1902,
                                                             1904,1905,1906,1907,1908,
                                                             1910,1911,1912,1913,1914,
                                                             1915,1916,1917,1918,1919,
                                                             1920,1921,1922,1923,1924,
                                                             1925,1926,1927,1928,1929,
                                                             1930,1931,1932,1934,1935,
                                                             1936,1937,1938,1939,1940,
                                                             1941,1942,1945,1946,1947,
                                                             1948,1949)] <- "Before1950"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1950,1951,1952,1953,1954,
                                                             1955,1956,1957,1958,1959,
                                                             1960,1961,1962,1963,1964,
                                                             1965,1966,1967,1968,1969,
                                                             1970,1971,1972,1973,1974,
                                                             1975,1976,1977,1978,1979,
                                                             1980,1981,1982,1983,1984,
                                                             1985,1986,1987,1988,1989,
                                                             1990,1991,1992,1993,1994,
                                                             1995,1996,1997,1998,1999)] <- "Between1950-2000"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(2000,2001,2002,2003,2004,
                                                             2005,2006,2007,2008,2009,
                                                             2010)] <- "After2000"

我試過用cut函數,但是對我來說不太好用,它基本上把所有的變量都放到了第一類,其他兩個類都歸零了。

有沒有更簡單的方法可以做到這一點?

一種選擇是創建一個邏輯向量

v1 <- as.numeric(levels(traintest$YearBuilt))
i1 <- v1  < 1950
i2 <- !i1 & v1 < 2000
i3 <- v1 >=2000
levels(traintest$YearBuilt)[i1] <- "Before 1950"
levels(traintest$YearBuilt)[i2] <- "Between1950-2000"
levels(traintest$YearBuilt)[i3] <- "After 2000"

或者使用cut

levels(traintest$YearBuilt) <- cut(v1, breaks = c(-Inf, 1949, 1999, 
       Inf), labels = c("Before1950", "Between1950-2000", "After 2000"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM