简体   繁体   中英

R dummies package weird column names when knitted via .Rmd

I've just noticed a very weird behavior in the dummies package of R when knitted in .Rmd . Here's the reproducible example.

---
title: "Dummies Package Behavior"
author: "Kim"
date: '`r Sys.Date()`'
output:
  pdf_document:
    toc: yes
    toc_depth: '3'
---

Load the libraries

```{r}
library(tidyverse)
library(dummies)
```

Main data wrangling

```{r}
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
```

View output

```{r}
df
```

What I'm expecting to see when I view the df are nice 0-1 columns of year2016 , year2017 , and year2018 , which is the normal behavior for the dummies package.

When you knit this R Markdown document in RStudio, it instead brings out the following: C:/Users/Kim/Desktop/dummies.Rmd2016 , C:/Users/Kim/Desktop/dummies.Rmd2017 , and C:/Users/Kim/Desktop/dummies.Rmd2018 . That is, it uses the whole document address to make the column names.

I don't understand why such behavior occurs. Obviously, I want to have column names as year2016 , year2017 , and year2018 .

The problem is not related to dplyr because we can reproduce it with data.frame() . Apparently there is a problem with assigning column labels in the dummy() function when executed as part of an R Markdown document. As noted in Luke's answer, one workaround is to use dummy.data.frame() . Another would be to use the colnames() function to rename the columns after binding the year and dummy variables with cbind() , which also enables a dplyr -based solution.

This should probably be submitted as a bug report for the dummies package.

---
title: "Behavior of dummies package"
author: "anAuthor"
date: "12/26/2017"
output:
  html_document: default
  pdf_document: default
  word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# first, reproduce error with data.frame()

```{r}
library(dummies)
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy(df$year)
dummyCols <- as.data.frame(dummyCols)
dummyCols
```

# data.frame() approach to fix the error

```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy.data.frame(data=df,dummy.classes="ALL")
dummyCols
df <- cbind(df, dummyCols)
df
```

...and the output, first reproducing the error.

在此输入图像描述

...second, using dummies.data.frame() to avoid the error.

在此输入图像描述

The dplyr correction works as follows.

# dplyr approach 

```{r}
library(tidyverse)
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
colnames(df) <- c("year",unlist(lapply(2016:2018,function(x) {
     paste("year",x,sep="")
})))
df
```

在此输入图像描述

I'm not sure why that interaction is happening, but this slight modification seems to get around it:

```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df <- data.frame(df, dummy.data.frame(data = df, dummy.classes = "ALL"))
```

在此输入图像描述

Note that data.frame from base rather than data_frame from dplyr seems to make a difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM