简体   繁体   English

在 R 中创建数据框的嵌套循环

[英]nested loops to create data frames in R

Trying to create / store data.frames using a nested for loop.尝试使用嵌套的 for 循环创建/存储data.frames

I have some data on countries in a variable called countries , so USA, UK, Germany etc. which I have labeled them 1,2,3 respectively.我在一个叫做countries的变量中有一些关于国家的数据,所以USA, UK, Germany etc.我分别将它们标记为1,2,3

I also have data on specific industries in a variable industries for example textiles, retail, other .我还拥有可变industries特定行业的数据,例如textiles, retail, other . Again I have labeled these industries 1,2,3 .我再次将这些行业标记为1,2,3

What I am trying to do is to create a new data.frame which will take;我想要做的是创建一个新的data.frame这将需要;

country 1, industry 1
country 1, industry 2
country 1, industry 3

country 2, industry 1
country 2, industry 2
country 2, industry 3

country 3, industry 1
country 3, industry 2
country 3, industry 3

etc.等等。

I am hopeing to carry out analysis on each data.frame我希望对每个data.frame进行分析

what I am currently working with is the following;我目前正在使用的是以下内容;

m <- 3 # m countries
k <- 3 # k industries

    for(i in 1:length(m)){
      country.ID <- m[i]
      for(j in 1:length(k)){
        sector.ID <- k[j]
        S1 <- which(DF$COUNTRY.id == country.ID)
        S2 <- which(DF$INDUSTRY.id == sector.ID)
        rows.2.consider <- intersect(S1, S2)

# Here is where I am trying to save the data.frames for analysis

    }
}

If I have gone wrong at any point please point this out.如果我在任何时候出错,请指出这一点。 But I am trying to create many data.frames for each country and for each region, ie 3 countries * 3 industries in this example would give 9 data.frames但是我正在尝试为每个国家和每个地区创建许多data.frames ,即在这个例子中3 countries * 3 industries将提供9 data.frames

Here some sample code (I am actually using regional data not country data etc but the same pricipal still applies.这里有一些示例代码(我实际上使用的是区域数据而不是国家数据等,但相同的主要数据仍然适用。

# #
 ratios <- structure(list(IDVar = 1:40, Major.sectors = structure(c(5L, 9L, 3L, 15L, 11L, 7L, 18L, 18L, 18L, 3L, 3L, 3L, 3L, 17L, 3L, 11L, 7L, 17L, 3L, 11L, 3L, 18L, 3L, 17L, 9L, 18L, 9L, 19L, 3L, 11L, 11L, 2L, 5L, 3L, 18L, 17L, 4L, 2L, 3L, 3L), .Label = c("Banks", "Chemicals, rubber, plastics, non-metallic products", "Construction", "Education, Health", "Food, beverages, tobacco", "Gas, Water, Electricity", "Hotels & restaurants", "Insurance companies", "Machinery, equipment, furniture, recycling", "Metals & metal products", "Other services", "Post & telecommunications", "Primary sector", "Public administration & defense", "Publishing, printing", "Textiles, wearing apparel, leather", "Transport", "Wholesale & retail trade", "Wood, cork, paper"), class = "factor"), Region.in.country = structure(c(15L, 8L, 8L, 8L, 10L, 15L, 19L, 10L, 8L, 10L, 3L, 18L, 4L, 12L, 4L, 15L, 13L, 4L, 15L, 15L, 7L, 15L, 12L, 1L, 7L, 10L, 15L, 8L, 13L, 15L, 12L, 8L, 7L, 15L, 15L, 10L, 8L, 10L, 10L, 15L), .Label = c("Andalucia", "Aragon", "Asturias", "Canary Islands", "Cantabria", "Castilla-La Mancha", "Castilla y Leon", "Cataluna", "Ceuta", "Comunidad Valenciana", "Extremadura", "Galicia", "Islas Baleares", "La Rioja", "Madrid", "Melilla", "Murcia", "Navarra", "Pais Vasco"), class = "factor"), EBIT.TA = c(-0.234432635519391, -0.884337466274593, -0.00446559204081373, 0.11109107677028, -0.137203773525798, -0.582114677880617, 0.0190497663203189, -3.04252763094666, 0.113157822682219, -0.0255533180037229, 0.281767142199724, 0.0326641697396841, -0.00879974750993553, 0.0542074697816672, -0.112104697294392, -0.191945591325174, -0.00380586115226597, -0.0363239884169068, -0.273949107908537, 0.435398668004486, -0.00563436099927988, -2.75971618056051, -0.1047327709263, 0.151283793741506, -0.0373197549569126, 0.00912639083178201, -0.0386627754065697, -0.018235399636112, -0.0118104711362467, -0.701299939137125, NA, 0.0191819361175666, -0.0104887983706721, -0.801677105519484, -0.402194475974272, -0.124125227730062, 0.143020458476649, -0.601186271451194, 0.0163269364787831, 5.09955167591238), EBIT.TA_l1 = c(-0.443687074746458, -0.561864166134075, -0.0345769510044604, 0.0282541797531804, -0.0181173929170762, 0.0147211350970115, 0.0588534950162799, -1.14097109926961, 0.060100343733096, -0.0386426338471025, 0.049684095221329, 0.0558174150334904, 0.00214962169435867, 0.0399960114646072, 0.0402934579830171, -0.612359147433149, -0.0115916125659674, 0.00739473610413031, 0.0174576615247567, 0.68624861825246, 0.0305807338940829, -3.88006243913616, 0.0410122725022661, -0.089491343996377, -0.215219123182103, 0.00967853324842811, -0.0336715197882038, 0.362424791356667, 0.221203934329637, -0.654387857513823, 0.0656934439915892, 0.0652005453654772, 0.0339559014267185, 0.0259085077216708, -0.303606048856146, 0.0280113794301873, 0.109307291990628, -0.470048555841697, -0.00157699300508027, -0.350519090107081 ), EBIT.TA_l2 = c(-0.351308186716873, 0.00159428805074234, -0.00604587147802615, 0.0761894448922952, -0.00348378141492824, NA, 0.0346370866793768, -0.552226781084599, 0.00220031803369861, -0.0285840972149053, 0.065316579236306, 0.4090851643341, -0.0188362202518351, 0.0403848986306371, 0.091146090480032, -0.0154168449752466, -0.0694803621032671, 0.0511978643139393, -0.452924037757731, -0.0091835704914724, 0.0119918914092344, 0.0858960833880717, NA, 0.104901526886479, -0.23096183545392, -0.0163058345980967, 0.100643431561465, 0.0527859573541712, 0.250207316117438, NA, 0.00193240515291123, 0.0624210741756767, 0.0178136227732972, -0.0321294913646274, -0.0699629484084657, -0.00417176180400133, 0.209612573099415, 0.0285645570852926, 0.0551624216079071, 0.0172738293439595), Major.sectors.id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 8L, 3L, 5L, 6L, 8L, 3L, 5L, 3L, 7L, 3L, 8L, 2L, 7L, 2L, 9L, 3L, 5L, 5L, 10L, 1L, 3L, 7L, 8L, 11L, 10L, 3L, 3L), Region.in.country.id = c(1L, 2L, 2L, 2L, 3L, 1L, 4L, 3L, 2L, 3L, 5L, 6L, 7L, 8L, 7L, 1L, 9L, 7L, 1L, 1L, 10L, 1L, 8L, 11L, 10L, 3L, 1L, 2L, 9L, 1L, 8L, 2L, 10L, 1L, 1L, 3L, 2L, 3L, 3L, 1L)), .Names = c("IDVar", "Major.sectors", "Region.in.country", "EBIT.TA", "EBIT.TA_l1", "EBIT.TA_l2", "Major.sectors.id", "Region.in.country.id"), row.names = c(NA, 40L), class = "data.frame")

You can do你可以做

m <- 3 # m countries
k <- 3 # k industries
d <- data.frame(country=rep(1:m, each=k), industry=rep(1:k, m) )

for a single data.frame对于单个 data.frame

You can split that into 9 data.frames您可以将其拆分为 9 个 data.frames

split(d,d)

One option could be using expand.grid .一种选择是使用expand.grid Prepare data.frame with desired country and industry and then expand the same using expand.grid to generate all possible combinations.准备具有所需countryindustry data.frame ,然后使用expand.grid对其进行expand.grid以生成所有可能的组合。

df <- data.frame(c= c("country1","country2", "country3"), 
            i = c("industry1", "industry2","industry3"))

library(dplyr)
expand.grid(df) %>% arrange(c)

         c         i
1 country1 industry1
2 country1 industry2
3 country1 industry3
4 country2 industry1
5 country2 industry2
6 country2 industry3
7 country3 industry1
8 country3 industry2
9 country3 industry3

You don't actually need to split data nor create indexes.您实际上不需要拆分数据或创建索引。 Can do like this to run analysis for each industry and country:可以这样做对每个行业和国家进行分析:

YourAnalysis <- function(x) mean(x$EBIT.TA)

by(data = ratios, INDICES = list(ratios$Region.in.country, ratios$Major.sectors), FUN = YourAnalysis)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM