I am new to R, I have a csv file that contains values:
A, , ,
,B, ,
, ,C1,
, , ,D1
, , ,D2
, ,C2,
, , ,D3
, , ,D4
Loading the data into a data frame:
dat = read.csv("~/RData/test.csv", header = FALSE)
dat
# V1 V2 V3 V4
# 1 A
# 2 B
# 3 C1
# 4 D1
# 5 D2
# 6 C2
# 7 D3
# 8 D4
I need to wrangle this to a data frame format:
A,B,C1,D1
A,B,C1,D2
A,B,C2,D3
A,B,C2,D4
Thanks in advance!
By using zoo
library(zoo)
df[df==' '] <- NA
df[1:3] <- lapply(df[1:3], na.locf0, fromLast = FALSE)
df <- df[!is.na(df$V4),]
df
giving:
V1 V2 V3 V4
4 A B C1 D1
5 A B C1 D2
7 A B C2 D3
8 A B C2 D4
or by using magrittr too we can write the above code in terms of this pipeline:
library(magrittr)
library(zoo)
df %>%
replace(. == ' ', NA) %>%
replace(1:3, lapply(.[1:3], na.locf0, fromLast = FALSE)) %>%
subset(!is.na(V4))
A solution using dplyr
and tidyr
. This solution follows the link in Gregor's comments. But instead of using zoo
package, here I show the use of fill
function from tidyr
, na.omit
from base R, and distinct
function from dplyr
.
library(dplyr)
library(tidyr)
dt2 <- dt %>%
fill(everything(), .direction = "down") %>%
na.omit() %>%
distinct(V4, .keep_all = TRUE)
dt2
V1 V2 V3 V4
1 A B C1 D1
2 A B C1 D2
3 A B C2 D3
4 A B C2 D4
DATA
dt <- read.table(text = "V1 V2 V3 V4
1 A NA NA NA
2 NA B NA NA
3 NA NA C1 NA
4 NA NA NA D1
5 NA NA NA D2
6 NA NA C2 NA
7 NA NA NA D3
8 NA NA NA D4",
header = TRUE, stringsAsFactors = FALSE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.