简体   繁体   中英

Reading specific positions in a group to create new rows in a column R

I have multiple csv files with a double column structure. Once the file is in R it looks like this

# A tibble: 18 x 3
# Groups:   group [2]
   V1                        V2                                          group
   <chr>                     <chr>                                       <int>
 1 Sample File               "C:\\Data\\CPC\\COALA_CPC3776_20200129.xls"     0
 2 Model                     "3776"                                          0
 3 Sample #                  "1"                                             1
 4 Start Date                "01/29/20"                                      1
 5 Start Time                "03:06:08"                                      1
 6 Sample Length             "04:58"                                         1
 7 Averaging Interval (secs) "1.0"                                           1
 8 Title                     ""                                              1
 9 Instrument ID             "3776  70634317 2.7"                            1
10 Instrument Errors         "None"                                          1
11 Mean                      "4687.93"                                       1
12 Min                       "4215"                                          1
13 Max                       "5095"                                          1
14 Std. Dev.                 "208.445"                                       1
15 Time                      "Concentration (#/cm³)"                         1
16 03:06:09                  "4581"                                          1
17 03:06:10                  "4673"                                          1
18 03:06:11                  "4657"                                          1

This format repeats every 5 minutes. I want to move the date and sample # to new columns and then remove all that other lines that are between Sample File to Std.Dev. in V1 to get something like this.

   time concentration     date sample
1 02:02:02          1200 01/01/01      2
2 02:02:03          1300 01/01/01      2
3 02:03:03          4000 01/01/01      2

I can group the data by Sample # but then I dont know how to proceed. This is my code so far

cpc_files <- list.files(pattern = '*.xls',path = 'input/CPC/')

cpc_raw <- do.call("rbind",  ##Apply the bind to the files
        lapply(cpc_files, ##call the list
               function(x)  ##apply the next function
                 read.table(paste("input/CPC/", x, sep=''),sep=',',fill = T, header = F, 
                          stringsAsFactors = FALSE,comment.char = "",
                          col.names = paste0("V",seq_len(max(count.fields("input/CPC/COALA_CPC3776_20200129.xls", sep = ','))))))) ##Read all the files filling the blanks with NAs

cpc_fix <- cpc_raw%>%select(V1,V2)%>%
          group_by(group = cumsum(V1 == "Sample #"))

I simplified your input into 2 columns but this should be a good start.

x <- read.csv(file = '~/file.csv', stringsAsFactors = F)

df <- cbind(t(x$V2[1:(which('Time'==x$V1)-1)]), 
           x[(which('Time'==x$V1)+1):nrow(x),], stringsAsFactors = F)

colnames(df) <- unlist(c(x$V1[1:(which('Time'==x$V1)-1)], 
                                 x[(which('Time'==x$V1)),]))

The first argument to cbind is the metadata (row 1 to where it finds 'Time' ) and the second is the samples (everything after 'Time' ). The same logic for setting the column names. You can also store the names as a row if you want.

df2 <- rbind(colnames(df), df)

I solve it dividing the process into two parts:

  1. Moving date and sample to a new column:

    cpc_fix <- cpc_raw%>%select(V1,V2)%>% group_by(group = cumsum(V1 == "Sample #"))%>% mutate(date= V2[V1=='Start Date'][1], sample=V2[V1=='Sample #'][1])%>% ungroup()

  2. Remove everything that is not a Time object like using Alexander suggestion:

    cpc_clean <- cpc_fix[grep(pattern="[0-9][0-9]:[0-9][0-9]:[0-9][0-9]", cpc_fix$V1, perl=TRUE), ]

    colnames(cpc_clean) <- c('time','concentration','group','date','sample')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM