简体   繁体   中英

Trying to combine two datasets of different lengths (using Combine() in R

These are the two datasets in question:

> head(Housing_Training)
  Id MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt YearRemodAdd MasVnrArea TotalBsmtSF GrLivArea FullBath
1  1         60          65    8450           7           5      2003         2003        196         856      1710        2
2  2         20          80    9600           6           8      1976         1976          0        1262      1262        2
3  3         60          68   11250           7           5      2001         2002        162         920      1786        2
4  4         70          60    9550           7           5      1915         1970          0         756      1717        1
5  5         60          84   14260           8           5      2000         2000        350        1145      2198        2
6  6         50          85   14115           5           5      1993         1995          0         796      1362        1
  HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF MoSold YrSold
1        1            3            1            8          0        2003          2        548          0          61      2   2008
2        0            3            1            6          1        1976          2        460        298           0      5   2007
3        1            3            1            6          1        2001          2        608          0          42      9   2008
4        0            3            1            7          1        1998          3        642          0          35      2   2006
5        1            4            1            9          1        2000          3        836        192          84     12   2008
6        1            1            1            5          0        1993          2        480         40          30     10   2009
  SalePrice
1    208500
2    181500
3    223500
4    140000
5    250000
6    143000
> head(Housing_Testing)
  ï..Id MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt YearRemodAdd MasVnrArea TotalBsmtSF GrLivArea FullBath
1  1001         20          74   10206           3           3      1952         1952          0           0       944        1
2  1002         30          60    5400           5           6      1920         1950          0         691       691        1
3  1003         20          75   11957           8           5      2006         2006         53        1574      1574        2
4  1004         90          NA   11500           5           6      1976         1976        164        1680      1680        2
5  1005        120          43    3182           7           5      2005         2006         16        1346      1504        2
6  1006         80          65    8385           5           8      1977         1977        220         985       985        2
  HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF MoSold YrSold
1        0            2            1            4          0        1956          2        528          0           0      7   2009
2        0            2            1            4          0        1920          1        216          0          20      1   2007
3        0            3            1            7          1        2006          3        824        144         104      7   2008
4        0            4            2            8          0        1976          2        528          0           0      6   2007
5        0            1            1            7          1        2005          2        457        156           0      5   2009
6        0            3            1            6          0        1977          1        328        210           0     11   2008
  SalePrice
1     82000
2     86000
3    232000
4    136905
5    181000
6    149900

I am trying to combine them

the issue is that the Training dataset has 1000 rows and the testing has 460

But really, i just want to join them to have 1460 rows

The assignment says to use the combine function:

When i just do combine on the two datasets, i get this:

> combine(Housing_Training,Housing_Testing)
Error in `$<-.data.frame`(`*tmp*`, "layout", value = list(l = integer(0),  : 
  replacement has 1001 rows, data has 1000
In addition: Warning message:
'combine' is deprecated.
Use 'gtable_combine' instead.
See help("Deprecated") 

So then i tried

> gtable_combine(Housing_Training,Housing_Testing)
Error in `$<-.data.frame`(`*tmp*`, "layout", value = list(l = integer(0),  : 
  replacement has 1001 rows, data has 1000

According to ?dplyr::combine (used dplyr 1.0.0 )

combine() is deprecated in favour of vctrs::vec_c(). combine() attempted to automatically guess whether you wanted c() or unlist(), but could fail in surprising ways. We now believe it's better to be explicit.

So, it would be combining vectors and assuming that we need to bind the rows of the data.frame, an option is bind_rows

bind_rows(Housing_Training,Housing_Testing, .id = 'grp')

Teacher said:

You may want to use bind_rows() or rbind() functions instead. Please try and let me know. Thanks.

rbind() worked like a charm

facepalm

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM