简体   繁体   中英

How to spread two column dataframe with creating a unique identifier?

Trying to spread two column data to a format where there will be some NA values.

dataframe:

df <- data.frame(Names = c("TXT","LSL","TXT","TXT","TXT","USL","LSL"), Values = c("apple",-2,"orange","banana","pear",10,-1),stringsAsFactors = F)

在此处输入图片说明

If a row includes TXT following rows that has LSL or USL will belong to that row.

For ex:

  • in the first row; Name is TXT Value is apple next row is LSL value will be for apple's LSL and since no USL that will be NA until the next TXT name.

  • If there is a TXT followed by another TXT , then LSL and USL values for that row will be NA

trying to create this:

在此处输入图片说明

I tried using spread with row numbers as unique identifier but that's not what I want:

df %>% group_by(Names) %>% mutate(row = row_number()) %>% spread(key = Names,value = Values)

I guess I need to create following full table with NAs then spread but couldn't figure out how.

在此处输入图片说明

We can expand the dataset with complete after creating a grouping index based on the occurence of 'TXT'

library(dplyr)
library(tidyr)
df %>% 
     group_by(grp = cumsum(Names == 'TXT')) %>%
     complete(Names = unique(.$Names)) %>%
     ungroup %>% 
     spread(Names, Values) %>%
     select(TXT, LSL, USL)
# A tibble: 4 x 3
#  TXT    LSL   USL  
#  <chr>  <chr> <chr>
#1 apple  -2    <NA> 
#2 orange <NA>  <NA> 
#3 banana <NA>  <NA> 
#4 pear   -1    10   

In data.table , we can use dcast :

library(data.table)

dcast(setDT(df), cumsum(Names == 'TXT')~Names, value.var = 'Values')[, -1]

#    LSL    TXT  USL
#1:   -2  apple <NA>
#2: <NA> orange <NA>
#3: <NA> banana <NA>
#4:   -1   pear   10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM