繁体   English   中英

使用 R 中的通配符跨表聚合列

[英]aggregate columns across table using wildcard in R

我有一个 R dataframe 像这样:

TS                          Wafer(1)Radius(38)  Wafer(1)Radius(49)  Wafer(2)Radius(06)  Wafer(2)Radius(38)
2022-06-29T02:33:34.827582  699.565925              699.726346          700.526022          700.820165
2022-06-29T02:33:42.277582  700.053016              700.081032          700.558847          700.367924
2022-06-29T02:33:49.727582  699.707056              699.761278          700.485359          700.958339
2022-06-29T02:33:57.169972  699.909119              700.04703           700.419578          700.565945
2022-06-29T02:34:04.617582  699.74137               699.698093          700.397555          700.533587
2022-06-29T02:34:12.067582  700.106121              699.868698          700.573662          700.611651

我想获得按每行分组的每个晶圆的最小/最大/平均值。 例如,对于第一行/TS, Wafer(1)的最大值为699.726346 Wafer(2)的最大值为700.820165 这将在每个TS的行中继续下去。

我知道有一个pmax() function 可以在多个列中找到最大/最小值,但我如何将它分组到晶圆 1、晶圆 2 等? 另外我想应用其他统计数据,例如平均中位数等。这可能吗?

预期的结果是这样的:

TS                          Wafer(1)Radius(38)      Wafer(1)Radius(49)      Wafer(2)Radius(06)      Wafer(2)Radius(38)          Wafer1_Max          wafer2_max      wafer1_min      wafer2_min      wafer1_mean     wafer2_mean
2022-06-29T02:33:34.827582      699.565925                  699.726346      700.526022                  700.820165              699.726346          700.820165      699.565925      700.526022      699.6461355     700.6730935
2022-06-29T02:33:42.277582      700.053016                  700.081032      700.558847                  700.367924              700.081032          700.558847      700.053016      700.367924      700.067024      700.4633855
2022-06-29T02:33:49.727582      699.707056                  699.761278      700.485359                  700.958339              699.761278          700.958339      699.707056      700.485359      699.734167      700.721849
2022-06-29T02:33:57.169972      699.909119                  700.04703       700.419578                  700.565945              700.04703           700.565945      699.909119      700.419578      699.9780745     700.4927615
2022-06-29T02:34:04.617582      699.74137                   699.698093      700.397555                  700.533587              699.74137           700.533587      699.698093      700.397555      699.7197315     700.465571
2022-06-29T02:34:12.067582      700.106121                  699.868698      700.573662                  700.611651              700.106121          700.611651      699.868698      700.573662      699.9874095     700.5926565

如果我们将 pivot 您的数据转换为长格式,这会变得容易得多,因为我们可以使用标准的分组函数:

library(tidyr)
library(dplyr)
df %>%
  pivot_longer(
    -TS,
    names_pattern = "([0-9]+).*([0-9]+)",
    names_to = c("Wafer", "Radius"),
    values_to = "value"
  ) %>%
  group_by(TS, Wafer) %>%
  summarize(
    max = max(value),
    min = min(value),
    mean = mean(value),
    median = median(value),
    .groups = "drop"
  ) %>%
  as.data.frame
#                            TS Wafer      max      min     mean   median
# 1  2022-06-29T02:33:34.827582     1 699.7263 699.5659 699.6461 699.6461
# 2  2022-06-29T02:33:34.827582     2 700.8202 700.5260 700.6731 700.6731
# 3  2022-06-29T02:33:42.277582     1 700.0810 700.0530 700.0670 700.0670
# 4  2022-06-29T02:33:42.277582     2 700.5588 700.3679 700.4634 700.4634
# 5  2022-06-29T02:33:49.727582     1 699.7613 699.7071 699.7342 699.7342
# 6  2022-06-29T02:33:49.727582     2 700.9583 700.4854 700.7218 700.7218
# 7  2022-06-29T02:33:57.169972     1 700.0470 699.9091 699.9781 699.9781
# 8  2022-06-29T02:33:57.169972     2 700.5659 700.4196 700.4928 700.4928
# 9  2022-06-29T02:34:04.617582     1 699.7414 699.6981 699.7197 699.7197
# 10 2022-06-29T02:34:04.617582     2 700.5336 700.3976 700.4656 700.4656
# 11 2022-06-29T02:34:12.067582     1 700.1061 699.8687 699.9874 699.9874
# 12 2022-06-29T02:34:12.067582     2 700.6117 700.5737 700.5927 700.5927

要返回宽格式,您可以使用以下命令:

... %>%
  pivot_wider(
    names_from = "Wafer",
    values_from = c("max", "min", "mean", "median"),
    names_glue = "Wafer({Wafer})_{.value}"
  )
# # A tibble: 6 × 9
#   TS                         `Wafer(1)_max` Wafer(2)_ma…¹ Wafer…² Wafer…³ Wafer…⁴ Wafer…⁵ Wafer…⁶ Wafer…⁷
#   <chr>                               <dbl>         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
# 1 2022-06-29T02:33:34.827582           700.          701.    700.    701.    700.    701.    700.    701.
# 2 2022-06-29T02:33:42.277582           700.          701.    700.    700.    700.    700.    700.    700.
# 3 2022-06-29T02:33:49.727582           700.          701.    700.    700.    700.    701.    700.    701.
# 4 2022-06-29T02:33:57.169972           700.          701.    700.    700.    700.    700.    700.    700.
# 5 2022-06-29T02:34:04.617582           700.          701.    700.    700.    700.    700.    700.    700.
# 6 2022-06-29T02:34:12.067582           700.          701.    700.    701.    700.    701.    700.    701.
# # … with abbreviated variable names ¹​`Wafer(2)_max`, ²​`Wafer(1)_min`, ³​`Wafer(2)_min`, ⁴​`Wafer(1)_mean`,
# #   ⁵​`Wafer(2)_mean`, ⁶​`Wafer(1)_median`, ⁷​`Wafer(2)_median`

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM