I have multiple csv files with a double column structure. Once the file is in R it looks like this
# A tibble: 18 x 3
# Groups: group [2]
V1 V2 group
<chr> <chr> <int>
1 Sample File "C:\\Data\\CPC\\COALA_CPC3776_20200129.xls" 0
2 Model "3776" 0
3 Sample # "1" 1
4 Start Date "01/29/20" 1
5 Start Time "03:06:08" 1
6 Sample Length "04:58" 1
7 Averaging Interval (secs) "1.0" 1
8 Title "" 1
9 Instrument ID "3776 70634317 2.7" 1
10 Instrument Errors "None" 1
11 Mean "4687.93" 1
12 Min "4215" 1
13 Max "5095" 1
14 Std. Dev. "208.445" 1
15 Time "Concentration (#/cm³)" 1
16 03:06:09 "4581" 1
17 03:06:10 "4673" 1
18 03:06:11 "4657" 1
This format repeats every 5 minutes. I want to move the date and sample # to new columns and then remove all that other lines that are between Sample File to Std.Dev. in V1 to get something like this.
time concentration date sample
1 02:02:02 1200 01/01/01 2
2 02:02:03 1300 01/01/01 2
3 02:03:03 4000 01/01/01 2
I can group the data by Sample # but then I dont know how to proceed. This is my code so far
cpc_files <- list.files(pattern = '*.xls',path = 'input/CPC/')
cpc_raw <- do.call("rbind", ##Apply the bind to the files
lapply(cpc_files, ##call the list
function(x) ##apply the next function
read.table(paste("input/CPC/", x, sep=''),sep=',',fill = T, header = F,
stringsAsFactors = FALSE,comment.char = "",
col.names = paste0("V",seq_len(max(count.fields("input/CPC/COALA_CPC3776_20200129.xls", sep = ','))))))) ##Read all the files filling the blanks with NAs
cpc_fix <- cpc_raw%>%select(V1,V2)%>%
group_by(group = cumsum(V1 == "Sample #"))
I simplified your input into 2 columns but this should be a good start.
x <- read.csv(file = '~/file.csv', stringsAsFactors = F)
df <- cbind(t(x$V2[1:(which('Time'==x$V1)-1)]),
x[(which('Time'==x$V1)+1):nrow(x),], stringsAsFactors = F)
colnames(df) <- unlist(c(x$V1[1:(which('Time'==x$V1)-1)],
x[(which('Time'==x$V1)),]))
The first argument to cbind
is the metadata (row 1 to where it finds 'Time'
) and the second is the samples (everything after 'Time'
). The same logic for setting the column names. You can also store the names as a row if you want.
df2 <- rbind(colnames(df), df)
I solve it dividing the process into two parts:
Moving date and sample to a new column:
cpc_fix <- cpc_raw%>%select(V1,V2)%>% group_by(group = cumsum(V1 == "Sample #"))%>% mutate(date= V2[V1=='Start Date'][1], sample=V2[V1=='Sample #'][1])%>% ungroup()
Remove everything that is not a Time object like using Alexander suggestion:
cpc_clean <- cpc_fix[grep(pattern="[0-9][0-9]:[0-9][0-9]:[0-9][0-9]", cpc_fix$V1, perl=TRUE), ]
colnames(cpc_clean) <- c('time','concentration','group','date','sample')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.