简体   繁体   English

R - 根据单个列的多个条件从 dataframe 中删除行

[英]R - Removing rows from dataframe based on multiple conditions for a single column

I have the following example dataframe in R:我在 R 中有以下示例 dataframe:

SampleID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "C", "D", "D", "E", "E", "E", "E", "F", "F")
Analyte <- c("A1", "A1", "A2", "A2", "B1", "B2", "C1", "C1", "C1", "C2", "C2", "C2", "D1", "D2", "E1", "E1", "E2", "E2", "F1", "F2")
Fraction <- c("Dissolved", "Total", "Dissolved", "Total", "Total", "Total", "Dissolved", "Suspended", "Total", "Dissolved", "Suspended", "Total", "Unknown", "Unknown", "Dissolved", "Suspended", "Dissolved", "Suspended", "Dissolved", "Dissolved")
Concentration <- c(4.2, 5.6, 8.6, 11.2, 2.1, 9.6, 15.6, 28.7, 42.3, 18.3, 23.2, 48.6, 6.4, 28.8, 9.1, 32.5, 36.4, 24.5, 10.7, 3.4)
MyData <- data.frame(SampleID, Analyte, Fraction, Concentration)

MyData

   SampleID Analyte  Fraction Concentration
1         A      A1 Dissolved           4.2
2         A      A1     Total           5.6
3         A      A2 Dissolved           8.6
4         A      A2     Total          11.2
5         B      B1     Total           2.1
6         B      B2     Total           9.6
7         C      C1 Dissolved          15.6
8         C      C1 Suspended          28.7
9         C      C1     Total          42.3
10        C      C2 Dissolved          18.3
11        C      C2 Suspended          23.2
12        C      C2     Total          48.6
13        D      D1   Unknown           6.4
14        D      D2   Unknown          28.8
15        E      E1 Dissolved           9.1
16        E      E1 Suspended          32.5
17        E      E2 Dissolved          36.4
18        E      E2 Suspended          24.5
19        F      F1 Dissolved          10.7
20        F      F2 Dissolved           3.4

I would like to do the following:我想做以下事情:

  1. For each SampleID , if an Analyte has a "Total" Fraction reported, retain only that row for the Analyte and remove rows with any other Fraction value (ie, Dissolved, Suspended) for that Analyte .对于每个SampleID ,如果Analyte报告了“Total” Fraction ,则仅保留Analyte的该行并删除该Analyte具有任何其他Fraction值(即 Dissolved、Suspended)的行。

  2. If an Analyte for a SampleID includes both Dissolved and Suspended in the Fraction column (and no other values for Fraction ), sum the concentrations for Dissolved and Suspended and add a row for that Analyte with the Fraction column labeled Total and the Concentration column listing the sum.如果SampleIDAnalyteFraction列中包括 Dissolved 和 Suspended(并且Fraction没有其他值),将 Dissolved 和 Suspended 的浓度相加,并为该Analyte添加一行,其中Fraction列标记为 Total, Concentration列列出和。 Remove the original rows for Dissolved and Suspended for that Analyte .删除该Analyte的 Dissolved 和 Suspended 的原始行。

So for the dataframe above, the two Analytes of SampleID "A" have Dissolved and Total, so I would want to remove the rows with the Dissolved Fraction .因此,对于上面的 dataframe, SampleID "A" 的两个Analytes已经 Dissolved 和 Total,所以我想删除带有 Dissolved Fraction的行。 For SampleID "C", I would want to remove the Dissolved and Suspended Fractions of both Analytes and just keep the rows with Total.对于SampleID “C”,我想删除两种Analytes的溶解和悬浮Fractions ,只保留总计的行。 And lastly, for SampleID "E", the Dissolved and Suspended Fractions for each of the two Analytes would be summed together and the result would be a new row for each Analyte that represents the sum (relabeled as Total), and the rows associated with the Dissolved and Suspended Fractions would be removed.最后,对于SampleID “E”,将两种Analytes中每一种的溶解和悬浮Fractions相加,结果将是每个Analyte的新行,表示总和(重新标记为总计),以及与溶解和悬浮的Fractions将被删除。

The output of the above dataframe MyData would be the following:上述 dataframe MyData的 output 将如下:

   SampleID Analyte  Fraction Concentration
2         A      A1     Total           5.6
4         A      A2     Total          11.2
5         B      B1     Total           2.1
6         B      B2     Total           9.6
9         C      C1     Total          42.3
12        C      C2     Total          48.6
13        D      D1   Unknown           6.4
14        D      D2   Unknown          28.8
15        E      E1     Total          41.6 
17        E      E2     Total          60.9
19        F      F1 Dissolved          10.7
20        F      F2 Dissolved           3.4

Note that the example I have provided is just a small subset of a much larger dataset that includes hundreds of SampleIDs , but the Fraction column can only equal the values listed in the original dataframe above (ie, Dissolved, Suspended, Total, or Unknown).请注意,我提供的示例只是包含数百个SampleIDs的更大数据集的一小部分,但Fraction列只能等于上面原始 dataframe 中列出的值(即 Dissolved、Suspended、Total 或 Unknown) .

Thank you!谢谢!

This could be done as:这可以这样做:

library(tidyverse)
MyData %>%
  pivot_wider(c(SampleID, Analyte),Fraction, values_from = Concentration) %>%
  mutate(Total = coalesce(Total, Dissolved + Suspended), 
         Dissolved = ifelse(is.na(Total)&is.na(Suspended), Dissolved, NA),
         Suspended = ifelse(is.na(Total)&is.na(Dissolved), Suspended, NA)) %>%
  pivot_longer(-c(SampleID, Analyte), values_drop_na = TRUE)

# A tibble: 12 x 4
   SampleID Analyte name      value
   <chr>    <chr>   <chr>     <dbl>
 1 A        A1      Total       5.6
 2 A        A2      Total      11.2
 3 B        B1      Total       2.1
 4 B        B2      Total       9.6
 5 C        C1      Total      42.3
 6 C        C2      Total      48.6
 7 D        D1      Unknown     6.4
 8 D        D2      Unknown    28.8
 9 E        E1      Total      41.6
10 E        E2      Total      60.9
11 F        F1      Dissolved  10.7
12 F        F2      Dissolved   3.4
  

You can also use the following solution.您也可以使用以下解决方案。 It may sound a bit verbose but will also get the job done:这可能听起来有点冗长,但也可以完成工作:

library(dplyr)
library(purrr)


MyData %>%
  group_split(SampleID, Analyte) %>%
  map(~ if("Total" %in% .x$Fraction) {
    .x %>% filter(Fraction == "Total")} else {
      .x
    }) %>%
  map(~ if(all(c("Dissolved", "Suspended") %in% .x$Fraction)) {
    add_row(.x, SampleID = .x$SampleID[1], Analyte = .x$Analyte[1], 
            Fraction = "Total", Concentration = sum(.x$Concentration))
  } else {
    .x
  }) %>%
  map_dfr(~ if("Total" %in% .x$Fraction) {
    .x %>% filter(Fraction == "Total")} else {
      .x
    })


# A tibble: 12 x 4
   SampleID Analyte Fraction  Concentration
   <chr>    <chr>   <chr>             <dbl>
 1 A        A1      Total               5.6
 2 A        A2      Total              11.2
 3 B        B1      Total               2.1
 4 B        B2      Total               9.6
 5 C        C1      Total              42.3
 6 C        C2      Total              48.6
 7 D        D1      Unknown             6.4
 8 D        D2      Unknown            28.8
 9 E        E1      Total              41.6
10 E        E2      Total              60.9
11 F        F1      Dissolved          10.7
12 F        F2      Dissolved           3.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM