简体   繁体   English

R:如何自动计算数据框,然后根据多个数据框的结果生成图表?

[英]R: How to automate calculating a data frame then producing a chart based on the results for multiple data frames?

How can I build a dynamic "downflow pipeline" to push data frames through with R to auto-calculate formulas using these data frames?如何构建动态“下行管道”以使用 R 推送数据帧以使用这些数据帧自动计算公式? I have this data frame called autocalc, which has blank spaces carved out for equations.我有一个名为 autocalc 的数据框,其中有为方程式雕刻的空格。 For example, I need to apply equations such as this: autocalc$PPH <- Tokyo$P / Tokyo$PH .例如,我需要应用这样的方程: autocalc$PPH <- Tokyo$P / Tokyo$PH PPH is already a column/vector. PPH 已经是一个列/向量。

autocalc <- data.frame("INCOME" = c("$0 to $15,000","$15,000 to $29,999","$30,000 to $39,999","$40,000 to $49,999","$50,000 to $69,999","$70,000 to $99,999","$100,000 to $149,999"),
                             "PPH" = c(0,0,0,0,0,0,0),
                             "PTS" = c(0,0,0,0,0,0,0))
    
    autocalc$PPH <- Tokyo$P / Tokyo$PH 
    autocalc$PTS <- autocalc$PPH * .05

The results of this data frame will be used to build a chart with ggplot2.此数据框的结果将用于使用 ggplot2 构建图表。

As you can see from the sample equation, there is a data frame called Tokyo.从示例方程可以看出,有一个名为 Tokyo 的数据框。 I have three city data frames (NewYork, Paris, Tokyo).我有三个城市数据框(纽约、巴黎、东京)。 Each data frame has the city's prisoners (vector P) and population (vector PH).每个数据框都有城市的囚犯(向量 P)和人口(向量 PH)。 The data is further categorized by income group.数据进一步按收入组分类。

Tokyo <- data.frame("INCOME" = c("$0 to $15,000","$15,000 to $29,999","$30,000 to $39,999","$40,000 to $49,999","$50,000 to $69,999","$70,000 to $99,999","$100,000 to $149,999"),
                          "P" = c(1844,1062,1036,448,770,364,395),
                          "PH" = c(84900,721007,80800,380004,675000,32900,39500))

I want to apply each city to the autocalc data frame and produce three separate charts.我想将每个城市应用于 autocalc 数据框并生成三个单独的图表。 I could copy/paste three versions of autocalc for each city, but that seems like bad code because if I have to fix something in autocalc, I have to redo the code for each city.我可以为每个城市复制/粘贴三个版本的 autocalc,但这似乎是糟糕的代码,因为如果我必须在 autocalc 中修复某些内容,我必须为每个城市重做代码。

I looked at the following leads, but I feel like this shouldn't be that complex of an issue:我查看了以下线索,但我觉得这不应该是一个复杂的问题:

  1. Use objective oriented programming with the R6 library.通过 R6 库使用面向目标的编程。 Create an autocalc class and apply a copy to the three cities?创建一个自动计算类并将副本应用到三个城市? There aren't many R6 tutorials that make sense for a beginner, so I feel like there's an easier way?对初学者有意义的 R6 教程并不多,所以我觉得有更简单的方法吗?

  2. Use lapply().使用 lapply()。 I found a tutorial that produces something similar (three separate charts based on three separate data frames), but it can't explain how to swap autocalc$PPH <- Tokyo$P / Tokyo$PH for autocalc$PPH <- Paris$P / Paris$PH .我找到了一个产生类似内容的教程(基于三个单独的数据框的三个单独的图表),但它无法解释如何将autocalc$PPH <- Tokyo$P / Tokyo$PH autocalc$PPH <- Paris$P / Paris$PH And to preserve autocalc's calculations with each city, to prepare three charts.并保留每个城市的 autocalc 计算,准备三个图表。 Should I instead get rid of the autocalc data frame and add rows to each city's data frame?我应该去掉 autocalc 数据框并在每个城市的数据框中添加行吗?

  3. My friend not well versed in R, but is a programmer, and recommends looking into collection methods.我的朋友不精通 R,但他是一名程序员,建议研究收集方法。 To write a script and use methods to do repeated operations.编写脚本并使用方法进行重复操作。 However, I can't find a tutorial to do with R. I think this requires me to use object-oriented programming with the R6 library?但是,我找不到有关 R 的教程。我认为这需要我使用 R6 库进行面向对象编程? I'm thinking maybe my task is better suited with Python then?我在想也许我的任务更适合 Python 呢? I'm being told that R is more for analysis and not building something as dynamic as this.有人告诉我 R 更适合分析,而不是构建像这样动态的东西。

I think option 2 is the most easy and straight-forward one.我认为选项 2 是最简单和直接的选项。 You can put the 3 dataframes in a list and use lapply .您可以将 3 个数据帧放在一个列表中并使用lapply You can pass an anonymous function in lapply to refer to the each cities dataframe inside the function.您可以在lapply传递一个匿名函数来引用函数内的每个城市数据lapply

list_dfs <- list(Tokyo, Paris, NewYork)

list_plots <- lapply(list_dfs, function(x) {
  autocalc$PPH <- x$P / x$PH 
  autocalc$PTS <- autocalc$PPH * .05
  #ggplot2  code here
})

After you enter the ggplot2 code in lapply , it will generate a list of 3 plots for each city which you can access by doing list_plots[[1]] , list_plots[[2]] and list_plots[[3]] .当您输入ggplot2代码lapply ,就会产生3个区列表,每个您可以通过做访问城市list_plots[[1]] list_plots[[2]]list_plots[[3]]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于列将数据帧拆分为多个数据帧 - How to split a data frame into multiple data frames based on columns 根据列中的一组参数将数据框拆分为多个数据框 - Split data frame into multiple data frames based on a group of parameters in a column 根据唯一的列组合将数据框拆分为多个数据框 - Split data frame into multiple data frames based on unique column combinations 如何从三个数据帧上的 if 语句的结果创建数据帧 - How to create a data frame from the results of an if statement on three data frames 如何遍历多个数据框以根据行条件选择数据框? - How to loop through multiple data frames to select a data frame based on row criteria? 基于 3 个数据框有条件地创建数据框 - Creating data frame conditionally based on 3 data frames 为第一个数据帧的每一列计算两个数据帧的差异 - Calculating difference of two data frames for each column of first data frame 根据 pandas 中的列值将单个数据帧拆分为多个数据帧 - Split a single data frame into multiple data frames based on a columns value in pandas 根据布尔值将多个布尔数据帧合并为一个数据帧 - Merge multiple Boolean data frames into one data frame based on Boolean values Python Pandas - 通过匹配主标识符将多个数据帧中的数据附加到同一行,如果没有来自该数据帧的结果则留空 - Python Pandas - Appending data from multiple data frames onto same row by matching primary identifier, leave blank if no results from that data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM