简体   繁体   中英

R Data wrangling

I am new to R, I have a csv file that contains values:

A, , ,
,B, ,
, ,C1,
, , ,D1
, , ,D2
, ,C2,
, , ,D3
, , ,D4

Loading the data into a data frame:

dat = read.csv("~/RData/test.csv", header = FALSE)
dat
#   V1 V2 V3 V4
# 1  A         
# 2     B      
# 3       C1   
# 4          D1
# 5          D2
# 6       C2   
# 7          D3
# 8          D4

I need to wrangle this to a data frame format:

A,B,C1,D1
A,B,C1,D2
A,B,C2,D3
A,B,C2,D4

Thanks in advance!

By using zoo

library(zoo)
df[df==' '] <- NA
df[1:3] <- lapply(df[1:3], na.locf0, fromLast = FALSE)
df <- df[!is.na(df$V4),]
df

giving:

  V1 V2 V3 V4
4  A  B C1 D1
5  A  B C1 D2
7  A  B C2 D3
8  A  B C2 D4

or by using magrittr too we can write the above code in terms of this pipeline:

library(magrittr)
library(zoo)

df %>% 
   replace(. == ' ', NA) %>%
   replace(1:3, lapply(.[1:3], na.locf0, fromLast = FALSE)) %>%     
   subset(!is.na(V4))

A solution using dplyr and tidyr . This solution follows the link in Gregor's comments. But instead of using zoo package, here I show the use of fill function from tidyr , na.omit from base R, and distinct function from dplyr .

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  fill(everything(), .direction = "down") %>%
  na.omit() %>%
  distinct(V4, .keep_all = TRUE)
dt2
  V1 V2 V3 V4
1  A  B C1 D1
2  A  B C1 D2
3  A  B C2 D3
4  A  B C2 D4

DATA

dt <- read.table(text = "V1 V2 V3 V4
1  A NA NA NA         
                 2  NA  B NA NA      
                 3  NA  NA  C1 NA   
                 4  NA  NA  NA D1
                 5  NA  NA  NA D2
                 6  NA  NA  C2 NA   
                 7  NA  NA  NA D3
                 8  NA  NA  NA D4",
                 header = TRUE, stringsAsFactors = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM