简体   繁体   English

R:从列表中提取参数

[英]R: Extract parameter from list

I'm working with R and have a quite cascaded list of data wherefrom I would like to extract the same variable of every data frame.我正在使用 R 并有一个相当级联的数据列表,我想从中提取每个数据帧的相同变量。 Here is an example (simplified from the original, I hope that is not too confusing) for one imported.csv file:这是一个 imported.csv 文件的示例(从原始版本简化而来,我希望不会太混乱):

Temp(A);    Density(B); Velocity(C)
21,54;      0,7;        1486,46
20,87;      0,76;       1484,42
20,34;      0,81;       1482,8
19,61;      0,81;       1480,5

# .csv files imported with:

data_files <- list.files("D:\\My\\data\\pathway") 

The code I used to create a list from 19 data frames is as follows:我用来从 19 个数据帧创建列表的代码如下:

lst1 <- map(data_files, ~ {
  data1 <- read.csv2(paste0("D:\\My\\data\\pathway\\", .x))
  df.sum <- data1 %>%
    select(Temperature(A), Density(B), Velocity(C)) %>% 
    summarise_each(funs(min = min, # in the example Min(1)
                        q25 = quantile(., 0.25), # Max(2)
                        median = median, # Mean(3)
                        q75 = quantile(., 0.75), # St.Dev.(4)
                        max = max,
                        mean = mean, 
                        sd = sd))
  df.stats.tidy <- df.sum %>% gather(stat, val) %>%
    separate(stat, into = c("var", "stat"), sep = "_") %>%
    spread(stat, val) %>%
    select(var, min, q25, median, q75, max, mean, sd) 
  return(df.stats.tidy)
})
lst1

The output list looks like that:输出列表如下所示:

列出数据,如我的数据

This is how it is listed when I open the whole list.这是我打开整个列表时列出的方式。 When I open the specific table of a single dataset, the table is transposed:当我打开单个数据集的特定表时,该表被转置:

单个数据集的单个表

How can I extract, for example, the temperature for every dataset to create a plot or do statistical tests?例如,我如何提取每个数据集的温度以创建绘图或进行统计测试?

I tried a few simple methods and was able to extract single values from a single data set.我尝试了一些简单的方法,并且能够从单个数据集中提取单个值。 Thus, I am able to extract, for example, the mean value for every parameter of dataset2.因此,我能够提取数据集 2 的每个参数的平均值。 However, this is not quite what I need, for I need the same value for the same parameter of all the different datasets.然而,这并不是我所需要的,因为我需要所有不同数据集的相同参数的相同值。 Does anyone have an idea of a simple way to decipher the order of this list?有没有人知道破译此列表顺序的简单方法? I can't find out how exactly the parapeters are defined.我无法找出栏杆的确切定义方式。

Ps here the dput() results: ps这里是dput()的结果:

> dput(lst1[1:2])
list(structure(list(var = c("Conduct.mS.cm.", "Depth.m.", "Salinity.psu.", 
"Sound.Velocity.m.sec.", "Temp.C."), min = c(0, -1.19, 0, 1402.98, 
-1.48), q25 = c(0.01, -0.91, 0.01, 1412.835, -0.51), median = c(9.225, 
-0.78, 9.885, 1421.785, 0.85), q75 = c(25.575, 39.9725, 31.0825, 
1440.7175, 2.09), max = c(26.28, 143.76, 32.02, 1453.52, 11.81
), mean = c(11.6531756756757, 23.0201351351351, 13.9187162162162, 
1426.98621621622, 1.26290540540541), sd = c(11.8954355870503, 
38.217076230762, 14.4467518784427, 14.8016328574063, 2.53744347569587
)), class = "data.frame", row.names = c(NA, -5L)), structure(list(
    var = c("Conduct.mS.cm.", "Depth.m.", "Salinity.psu.", "Sound.Velocity.m.sec.", 
    "Temp.C."), min = c(0, -2.17, 0, 1401.46, -1.44), q25 = c(0, 
    -1.14, 0, 1404.25, 0.0125), median = c(0.13, -1.08, 0.115, 
    1413.215, 0.49), q75 = c(25.035, 6.3225, 30.3525, 1440.2625, 
    1.53), max = c(26.35, 129.54, 32.11, 1486.46, 21.54), mean = c(7.78810344827586, 
    17.3289655172414, 9.34528735632184, 1424.01396551724, 2.13511494252874
    ), sd = c(11.6263191741139, 36.9663620576755, 14.0549552563496, 
    22.6029377552219, 5.01839273011273)), class = "data.frame", row.names = c(NA, 
-5L)))

nested lists can get out of hand pretty quickly and are not ideal for analysis, since almost all R functions rather expect dataframes(which are also enhanced lists).嵌套列表很快就会失控并且不适合分析,因为几乎所有 R 函数都更期望数据帧(这也是增强的列表)。 However you are lucky because your dataframes in the list seem rather homogenic (all dim = 5x8 ).但是,您很幸运,因为列表中的数据框看起来相当同质(所有 dim = 5x8 )。 so you could bind them together to a single dataframe.所以你可以将它们绑定到一个数据框。


## unlisting:
my_df<-purrr::map_df(mylist, ~as.data.frame(.x), .id="List")

You have now a column "List" which specifies what list the data came from您现在有一个“列表”列,它指定数据来自哪个列表

This df can now be used in calculations grouped by "var"现在可以在按"var"分组的计算中使用此 df


## summarizing mean of variable min and max across both lists 
my_df %>% group_by(var) %>% summarise_at(c("min","max"),~mean(.x))

 A tibble: 5 × 3
  var                       min    max
  <chr>                   <dbl>  <dbl>
1 Conduct.mS.cm.           0      26.3
2 Depth.m.                -1.68  137. 
3 Salinity.psu.            0      32.1
4 Sound.Velocity.m.sec. 1402.   1470. 
5 Temp.C.                 -1.46   16.7

You could also go one step further and pivot the data to long format:您还可以更进一步,将数据转换为长格式:


## Option 2 making a long df 
my_df_long<-my_df %>% tidyr::pivot_longer(min:sd,names_to = "metric")

> my_df_long
# A tibble: 70 × 4
   List  var            metric value
   <chr> <chr>          <chr>  <dbl>
 1 1     Conduct.mS.cm. min     0   
 2 1     Conduct.mS.cm. q25     0.01
 3 1     Conduct.mS.cm. median  9.22
 4 1     Conduct.mS.cm. q75    25.6 
 5 1     Conduct.mS.cm. max    26.3 
 6 1     Conduct.mS.cm. mean   11.7 
 7 1     Conduct.mS.cm. sd     11.9 
 8 1     Depth.m.       min    -1.19
 9 1     Depth.m.       q25    -0.91
10 1     Depth.m.       median -0.78
# … with 60 more rows

The same summarize function could now look something like this.相同的汇总函数现在看起来像这样。


## summarizing mean of variable min and max across both lists 
my_df_long %>%
  group_by(var,metric) %>% 
  filter(metric %in% c("min","max")) %>% 
  summarise(mean(value))


# A tibble: 10 × 3
# Groups:   var [5]
   var                   metric `mean(value)`
   <chr>                 <chr>          <dbl>
 1 Conduct.mS.cm.        max            26.3 
 2 Conduct.mS.cm.        min             0   
 3 Depth.m.              max           137.  
 4 Depth.m.              min            -1.68

However this is persona preference.但是,这是角色偏好。

Maybe you are able to find a way not to create lists in the first place, then you could save the extra step.或许您可以找到一种不首先创建列表的方法,然后您可以省去额外的步骤。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM