[英]Grouping column values based on conditions in R
I have a data set where in 1 column there are 142 unique values. 我有一个数据集,在1列中有142个唯一值。 As part of building a predictive model, I want to create dummy variables for that column.
作为构建预测模型的一部分,我想为该列创建虚拟变量。 But instead of creating 142 dummy variables, I first want to club the values which behaves similarly with respect to the response variable.
但是,我没有创建142个虚拟变量,而是首先合并了相对于响应变量而言行为相似的值。 The code which I used looks like below
我使用的代码如下所示
round(tapply(train_data$Price,train_data$Suburb,mean),0)
This gives me 142 different elements in the array, which is time consuming if I manually go through to find the similar values. 这给了我数组中142个不同的元素,如果我手动查找相似的值,这将很耗时。 A snippet of my outpout is pasted below:
我的支出的摘要粘贴在下面:
round(tapply(train_data$Price,train_data$Suburb,mean),0)
Abbotsford Aberfeldie Airport West
1057934 1235150 707542
Albert Park Albion Alphington
1919014 547711 1188880
Altona Altona North Armadale
757866 728127 1542430
Ascot Vale Ashburton Ashwood
968702 1595275 1049184
Avondale Heights Balaclava Balwyn
792321 675133 1912896
Balwyn North Bellfield Bentleigh
1769984 798778 1282869
Bentleigh East Box Hill Braybrook
1038886 1138650 646845
Brighton Brighton East Brooklyn
1864928 1607299 542182
Brunswick Brunswick East Brunswick West
952350 874927 744986
Bulleen Burnley Burwood
1142944 1150902 1167023
Camberwell Campbellfield Canterbury
1761263 447600 2284188
Carlton Carlton North Carnegie
1062721 1436615 915587
Caulfield Caulfield East Caulfield North
981417 1099000 1055575
Caulfield South Chadstone Clifton Hill
1119571 1007909 1049742
Coburg Coburg North Collingwood
851215 770902 858415
Cremorne Docklands Doncaster
943731 937500 1210059
Eaglemont East Melbourne Elsternwick
How can I write a code which groups all the values based on condition like the mean of which falls between 600000-699999, 700000-799999 and so on? 如何编写一个根据条件将所有值分组的代码,例如平均值介于600000-699999、700000-799999等之间?
I got the code which completely served my purpose 我得到的代码完全可以满足我的目的
subset(aggregate( Price ~ Suburb,
train_data,
function(x) ifelse (mean(x)>600000 & mean(x)<700000 ,1,0) ),Price=="1")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.