简体   繁体   中英

Data wrangling in R with dplyr

I have an issue with data structuring

Here is my dummy DF

Date      <- c("2015-01-01", "2016-01-01", "2017-01-01", "2018-01-01", "2019-01-01")
Stock1_P  <- c(1, 1.5, 2, 1.5, 2)
Stock1_MV <- c(10, 15, 20, 15, 20) 
Stock2_P  <- c(NA, NA, 5, 6, 7)
Stock2_MV <- c(NA, NA, 50, 60, 70) 
Stock3_P  <- c(NA, 2, 3, 4, 3)
Stock3_MV <- c(NA, 20, 30, 40, 30) 

dataset <- data.frame(Date, Stock1_P, Stock1_MV, Stock2_P, Stock2_MV, Stock3_P, Stock3_MV)

The initial dataset has the following structure. By the way, the structure is given by the source on which I downloaded the data. For each stock there is price and market value column.

数据集

The goal is to bring the dataset into the long tidy format. See following code.

dataset.tbl <- as.tibble(dataset)
dataset.tbl2 <- dataset %>% mutate(Date=as.Date(Date, format="%Y-%m-%d"))
data.final  <- dataset.tbl2 %>% pivot_longer(-Date, names_to = "Stock", values_to = "close")

The dataset has the following structure. Now, I want to generate two additional columns (P and MV).

数据.final

The final dataset should look like as follows. However, so far I haven found a solution for that.

输出数据

Hope somebody already dealt with such an issue.

Thx! :)

Use .value . Note the order is important when passing names to " names_to ", take into account the column names structure and the required output, in this case, we pass "Stock" first then ".value" to get "Stock", "P", then "MV".

library(tidyr) #v1.0.0
pivot_longer(dataset, -Date, names_to = c("Stock",".value"), names_sep = "_")

# A tibble: 15 x 4
   Date       Stock      P    MV
   <fct>      <chr>  <dbl> <dbl>
 1 2015-01-01 Stock1   1      10
 2 2015-01-01 Stock2  NA      NA
 3 2015-01-01 Stock3  NA      NA
 4 2016-01-01 Stock1   1.5    15
 5 2016-01-01 Stock2  NA      NA
 ...
library(tidyverse)

    dataset %>%          
      gather(Stock, close, -Date) %>% #tidyr version < 1.0.0
      separate(Stock, c('Stock', 'PMV')) %>%
      spread(PMV, close) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM