简体   繁体   中英

How to insert a place holder in a list, where pieces of data are missing using R?

I have a set of data out of an experiment that I have to analyse. But as there is also a lot of data in there that is not important for me, I wanted to tidy those files a bit up using R, as it is too much work to do manually. As the data in those .csv files is out of time course experiments, the order of the different measurements matters and the different numbers have to be in a specific order.

Until now, I have already managed to select all the columns that I need and to sort them by the different conditions using the following code:

used_columns <- select(df,
                       ImageNumber,
                       FrameNumber,
                       Treatment,
                       Intensity1,
                       Intensity2)

used_columns.t <- as.tibble(used_columns)

df_sorted <- used_columns.t %>%
  filter(Treatment == "B2") %>%
  .[order(as.integer(.$FrameNumber),decreasing = FALSE), ]

Using this code, df_sorted yields a data frame that looks like this:

ImageNumber FrameNumber Treatment   Intensity1  Intensity2
1           1           B2          1598,45         0,14
2           1           B2          930,40          0,11
3           1           B2          107,86          0,04
4           1           B2          881,09          0,11
7           1           B2          2201,98         0,15
8           1           B2          161,30          0,04
9           1           B2          1208,14         0,17
4           2           B2          831,75          0,12
5           2           B2          1027,41         0,14
7           2           B2          2052,16         0,15
8           2           B2          159,63          0,05
9           2           B2          1111,49         0,16
10          2           B2          1312,15         0,12
1           3           B2          863,79          0,10
2           3           B2          104,06          0,04
3           3           B2          816,02          0,11
4           3           B2          1053,02         0,14
5           3           B2          132,32          0,03
6           3           B2          2059,03         0,14
7           3           B2          153,49          0,04
8           3           B2          1118,69         0,15
9           3           B2          1632,66         0,18
10          3           B2          1302,15         0,12

However, I would like to have a table like this, where the missing values are indicated as NA (or whatever other placeholder):

ImageNumber FrameNumber Treatment   Intensity1  Intensity2
1           1           B2          1598,45         0,14
2           1           B2          930,40          0,11
3           1           B2          107,86          0,04
4           1           B2          881,09          0,11
5           NA          NA          NA              NA
6           NA          NA          NA              NA
7           1           B2          2201,98         0,15
8           1           B2          161,30          0,04
9           1           B2          1208,14         0,17
10          NA          NA          NA              NA
1           NA          NA          NA              NA
2           NA          NA          NA              NA
3           NA          NA          NA              NA
4           2           B2          831,75          0,12
5           2           B2          1027,41         0,14
6           NA          NA          NA              NA
7           2           B2          2052,16         0,15
8           2           B2          159,63          0,05
9           2           B2          1111,49         0,16
10          2           B2          1312,15         0,12
1           3           B2          863,79          0,10
2           3           B2          104,06          0,04
3           3           B2          816,02          0,11
4           3           B2          1053,02         0,14
5           3           B2          132,32          0,03
6           3           B2          2059,03         0,14
7           3           B2          153,49          0,04
8           3           B2          1118,69         0,15
9           3           B2          1632,66         0,18
10          3           B2          1302,15         0,12

This is just a very short extract of the table that I had and in reality, depending on the condition, the ImageNumber may go up to 1441. Do you know any possibility, how I could solve this problem?

I would be very grateful, if anybody could help me hereby!

Here is a split-apply-combine approach in base R

out <- do.call(rbind,
               by(
                 data = df1,
                 INDICES = df1$FrameNumber,
                 FUN = merge,
                 y = data.frame(ImageNumber = seq(min(df1$ImageNumber), max(df1$ImageNumber))),
                 all.y = TRUE
               ))
out
#     ImageNumber FrameNumber Treatment Intensity1 Intensity2
#1.1            1           1        B2    1598,45       0,14
#1.2            2           1        B2     930,40       0,11
#1.3            3           1        B2     107,86       0,04
#1.4            4           1        B2     881,09       0,11
#1.5            5          NA      <NA>       <NA>       <NA>
#1.6            6          NA      <NA>       <NA>       <NA>
#1.7            7           1        B2    2201,98       0,15
#1.8            8           1        B2     161,30       0,04
#1.9            9           1        B2    1208,14       0,17
#1.10          10          NA      <NA>       <NA>       <NA>
#2.1            1          NA      <NA>       <NA>       <NA>
#2.2            2          NA      <NA>       <NA>       <NA>
#2.3            3          NA      <NA>       <NA>       <NA>
#2.4            4           2        B2     831,75       0,12
# ...

We split your data by FrameNumber , merge each list element with a data frame that contains a single column called ImageNumber . That columns contains the values from min(df1$ImageNumber) to max(df1$ImageNumber) - that is from 1 to 10 in your example. The argument all.y = TRUE - which belongs to merge - turns implicit missing values into explicit missing values.

Finally we combine the list back to a data frame with do.call(rbind, ...) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM