Weird things with Automatically generate new variable names using dplyr mutate

Question

OK this is going to be a long post. So i am fairly new with R (i am currently using the MR free 3.5, with no checkpoint) but i am trying to work with the tidyverse, which i find very elegant in writing code and a lot of times a lot more simple.

I decided to replicate an exercise from guru99 here . It is a simple k-means exercise. However because i always want to write "generalizeble" code i was trying to automatically rename the variables in mutate with new names. So i searched SO and found this solution here which is very nice.

First what works fine.

#library(tidyverse)

link <- "https://raw.githubusercontent.com/guru99-edu/R-Programming/master/computers.csv"
df <- read.csv(link)

rescaled <- df %>% discard(is.factor) %>%
  select(-X) %>% 
  mutate_all(
    funs("scaled" = scale) 
  )

When you download the data with read.csv you get the df in dataframe class and everything works.

And now the weird thinks start. If you download the data with read_csv or make it a tibble at any point after ( the first X variable will be named X1 and you need to change the is.factor to is.character because stings are converted to character not factors unless explicitly asked for, for future me and others. ) and then run the code

df1 <- read_csv(link)

df1 %>% discard(is.character) %>%
  select(-X1) %>% 
  mutate_all(
    funs("scaled" = scale) 
  )

the new named variables are named price_scaled[,1] speed_scaled[,1] hd_scaled[,1] ram_scaled[,1] etc. when you view the output in the console or you even if you print().

BUT if you view() on it you see the output with the names you expect which are price_scaled speed_scaled hd_scaled etc. ALSO I am using an Rmarkdown document for the code and when i change the chunk output to inline it diplays the names correctly with hd_scaled etc.

Any one has any idea how to get the names printed in the console like price_scaled etc.
Why this is happening?

Though that this would be interesting to ask.

Answer 1

scale() returns a matrix, and dplyr/tibble isn't automatically coercing it to a vector. By changing your mutate_all() call to the below, we can have it return a vector. I identified this is what was happening by calling class(df1$speed_scaled) and seeing the result of "matrix".

library(tidyverse)
link <- "https://raw.githubusercontent.com/guru99-edu/R-Programming/master/computers.csv"
df <- read_csv(link)
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   price = col_double(),
#>   speed = col_double(),
#>   hd = col_double(),
#>   ram = col_double(),
#>   screen = col_double(),
#>   cd = col_character(),
#>   multi = col_character(),
#>   premium = col_character(),
#>   ads = col_double(),
#>   trend = col_double()
#> )

df %>% discard(is.character) %>%
  select(-X1) %>% 
  mutate_all(
    list("scaled" = function(x) scale(x)[[1]]) 
  ) 
#> # A tibble: 6,259 x 14
#>    price speed    hd   ram screen   ads trend price_scaled speed_scaled
#>    <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>        <dbl>        <dbl>
#>  1  1499    25    80     4     14    94     1        -1.24        -1.28
#>  2  1795    33    85     2     14    94     1        -1.24        -1.28
#>  3  1595    25   170     4     15    94     1        -1.24        -1.28
#>  4  1849    25   170     8     14    94     1        -1.24        -1.28
#>  5  3295    33   340    16     14    94     1        -1.24        -1.28
#>  6  3695    66   340    16     14    94     1        -1.24        -1.28
#>  7  1720    25   170     4     14    94     1        -1.24        -1.28
#>  8  1995    50    85     2     14    94     1        -1.24        -1.28
#>  9  2225    50   210     8     14    94     1        -1.24        -1.28
#> 10  2575    50   210     4     15    94     1        -1.24        -1.28
#> # ... with 6,249 more rows, and 5 more variables: hd_scaled <dbl>,
#> #   ram_scaled <dbl>, screen_scaled <dbl>, ads_scaled <dbl>,
#> #   trend_scaled <dbl>

Weird things with Automatically generate new variable names using dplyr mutate

Question

1 answers

solution1
0 ACCPTED 2019-11-16 17:35:22

Weird things with Automatically generate new variable names using dplyr mutate

Question

1 answers

solution1 0 ACCPTED 2019-11-16 17:35:22

solution1
0 ACCPTED 2019-11-16 17:35:22