简体   繁体   English

根据R中的条件对列值进行分组

[英]Grouping column values based on conditions in R

I have a data set where in 1 column there are 142 unique values. 我有一个数据集,在1列中有142个唯一值。 As part of building a predictive model, I want to create dummy variables for that column. 作为构建预测模型的一部分,我想为该列创建虚拟变量。 But instead of creating 142 dummy variables, I first want to club the values which behaves similarly with respect to the response variable. 但是,我没有创建142个虚拟变量,而是首先合并了相对于响应变量而言行为相似的值。 The code which I used looks like below 我使用的代码如下所示

round(tapply(train_data$Price,train_data$Suburb,mean),0)

This gives me 142 different elements in the array, which is time consuming if I manually go through to find the similar values. 这给了我数组中142个不同的元素,如果我手动查找相似的值,这将很耗时。 A snippet of my outpout is pasted below: 我的支出的摘要粘贴在下面:

round(tapply(train_data$Price,train_data$Suburb,mean),0)
        Abbotsford         Aberfeldie       Airport West 
           1057934            1235150             707542 
       Albert Park             Albion         Alphington 
           1919014             547711            1188880 
            Altona       Altona North           Armadale 
            757866             728127            1542430 
        Ascot Vale          Ashburton            Ashwood 
            968702            1595275            1049184 
  Avondale Heights          Balaclava             Balwyn 
            792321             675133            1912896 
      Balwyn North          Bellfield          Bentleigh 
           1769984             798778            1282869 
    Bentleigh East           Box Hill          Braybrook 
           1038886            1138650             646845 
          Brighton      Brighton East           Brooklyn 
           1864928            1607299             542182 
         Brunswick     Brunswick East     Brunswick West 
            952350             874927             744986 
           Bulleen            Burnley            Burwood 
           1142944            1150902            1167023 
        Camberwell      Campbellfield         Canterbury 
           1761263             447600            2284188 
           Carlton      Carlton North           Carnegie 
           1062721            1436615             915587 
         Caulfield     Caulfield East    Caulfield North 
            981417            1099000            1055575 
   Caulfield South          Chadstone       Clifton Hill 
           1119571            1007909            1049742 
            Coburg       Coburg North        Collingwood 
            851215             770902             858415 
          Cremorne          Docklands          Doncaster 
            943731             937500            1210059 
         Eaglemont     East Melbourne        Elsternwick 

How can I write a code which groups all the values based on condition like the mean of which falls between 600000-699999, 700000-799999 and so on? 如何编写一个根据条件将所有值分组的代码,例如平均值介于600000-699999、700000-799999等之间?

I got the code which completely served my purpose 我得到的代码完全可以满足我的目的

subset(aggregate( Price ~ Suburb, 
                  train_data, 
                  function(x) ifelse (mean(x)>600000 & mean(x)<700000 ,1,0) ),Price=="1")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM