简体   繁体   English

创建通过将函数应用于原始数据框的子集而生成的数据框的列表

[英]Create a list of dataframes that are generated by applying a function to subsets of an original dataframe

I'm trying to create a list of dataframes that have been created by applying a function to subsets of my original dataframe. 我正在尝试创建通过将函数应用于原始数据框的子集而创建的数据框的列表。

Here is some sample data: 以下是一些示例数据:

Data <- data.frame("Country" = c("UK", "UK", "US", "US", "US", "France", "France", "Japan", 
"Japan", "Japan", "India", "India"), "Outcome" = c("Y", "N", "Y", "Y", "Y", "N", "N", "Y",
"N", "Y", "N", "Y"))

I'm subsetting by one of my variables (country) and applying the same function to create a new dataframe from each subset: 我用一个变量(国家/地区)设置子集,并应用相同的功能从每个子集创建一个新的数据框:

Data.UK <- subset(Data, Country == "UK")
UK <- as.data.frame(table(Data.UK$Outcome))
Data.US <- subset(Data, Country == "US")
US <- as.data.frame(table(Data.US$Outcome))
Data.France <- subset(Data, Country == "France")
France <- as.data.frame(table(Data.France$Outcome))
Data.Japan <- subset(Data, Country == "Japan")
Japan <- as.data.frame(table(Data.Japan$Outcome))
Data.India <- subset(Data, Country == "India")
India <- as.data.frame(table(Data.India$Outcome))

And then I'm combining these dataframes into a list: 然后,我将这些数据框组合成一个列表:

Countries <- list(UK, US, France, Japan, India)

I'm sure there is an easier way to do this, especially for a much larger dataset with many more subsets (in my case I need to subset by every country in the world), I'm thinking I could do this by subsetting based on a character vector of the unique values for the variable I'm subsetting by, but I'm not sure how to go about this. 我敢肯定有一种简单的方法可以做到这一点,尤其是对于具有更多子集的更大数据集(在我的情况下,我需要按世界上每个国家/地区进行子集化),我想我可以通过基于在我作为子变量的变量的唯一值的字符向量上,但是我不确定如何执行此操作。 Any help is greatly appreciated! 任何帮助是极大的赞赏!

An option could be 一个选项可能是

library(dplyr)

result_by_country <- group_by(Data, Country) %>% 
  summarise(outcome_table = list(table(Outcome))) 

Then you can get the list 然后你可以得到清单

Countries <- result_by_country$outcome_table

Although it does seem like the dplyr package can be very helpful in this case, missuse gave me the idea of starting off by using the "split" function, so I ended up doing this: 尽管在这种情况下dplyr软件包似乎很有帮助,但是missuse给了我使用“ split”函数开始的想法,所以我最终这样做:

List_country <- split(Data, Data$Country)
Countries_outcome <- lapply(List_country, function(x) x[2])
Countries <- lapply(Countries_outcome, function(x) as.data.frame(table(x)))

Thank you all for the input! 谢谢大家的投入!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM