简体   繁体   English

从长到宽,数据集不平衡,每个观察一行

[英]From long to wide with unbalanced dataset and one row per observation

I've tried many things trying to figure out to go from long to wide, but I cannot get one row per observation.我已经尝试了很多试图找出从长到宽的方法,但是每次观察我都无法得到一行。 It gives many NA values since my data is unbalanced (I cannot just shift all values one row up, etc.)它提供了许多 NA 值,因为我的数据是不平衡的(我不能将所有值都向上移动一行,等等)

This is a part of my data:这是我的数据的一部分:

structure(list(employees = c(384, 432, 624, 334, 356, 338, 348, 
1122, 1110, 1492), profit_margin = c(-0.14684, -0.85298, -0.58792, 
-0.38872, -1.30312, -0.86866, -0.6363, -1.925, 0.567, 3.984), 
    RD_expenses = c(8946.414554, 9977.75638, 43326.90616, 48870.14658, 
    35022.10866, 39584.25952, 32259.2173, 6303.95, 6812.46, 14993.39
    ), RD_intensity = c(7.10910850621956, 8.98811378416267, 15.6492601635234, 
    17.6773777378817, 13.1744528168514, 14.3544852219875, 11.2624231565094, 
    0.500071500320608, 0.559723756230354, 1.36999818636439), 
    sales = c(125844.3945, 111010.5704, 276862.329, 276455.8596, 
    265833.4972, 275762.3064, 286432.2966, 1260609.732, 1217111.106, 
    1094409.478), treated = c("1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1"), year = c(2013L, 2014L, 2015L, 2016L, 2017L, 
    2018L, 2019L, 2015L, 2016L, 2017L), id = c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L), company = c("ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALPINE ELECTRONICS, INC.", "ALPINE ELECTRONICS, INC.", "ALPINE ELECTRONICS, INC."
    )), row.names = c(NA, -10L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x000001c71d471ef0>)

I've tried this:我试过这个:

test %>%   group_by(id, company) %>%   dplyr::mutate(row = row_number()) %>%   tidyr::pivot_wider(names_from = year, values_from = c("employees", "profit_margin", "RD_expenses", "RD_intensity", "sales", "treated")) 

But this gives many NA values and not one row per observation, like this:但这给出了许多 NA 值,而不是每个观察值一行,如下所示:

1   ALLERGAN PUBLIC LIMITED COMPANY 1   384 NA  NA  NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 2   NA  432 NA  NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 3   NA  NA  624 NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 4   NA  NA  NA  334 NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 5   NA  NA  NA  NA  356 
1   ALLERGAN PUBLIC LIMITED COMPANY 6   NA  NA  NA  NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 7   NA  NA  NA  NA  NA  
2   ALPINE ELECTRONICS, INC.    1   NA  NA  1122    NA  NA  
2   ALPINE ELECTRONICS, INC.    2   NA  NA  NA  1110    NA  
2   ALPINE ELECTRONICS, INC.    3   NA  NA  NA  NA  1492

Also, I do not have exactly 7 observations per company, so that makes it a bit harder.此外,我没有每家公司正好有 7 个观察结果,所以这有点困难。

I have also tried this:我也试过这个:

test %>% 
  group_by(id) %>% 
  dplyr::mutate(Visit = 1:n()) %>%
gather("employees", "profit_margin", "RD_expenses", "RD_intensity", "sales", "treated", "year", key = variable, value = number) %>%
unite(combi, variable, Visit) %>%
 spread(combi, number)

But that gives even more strange results, with columns up till _31, where the maximum of observations of 1 company (or id) is 7.但这给出了更奇怪的结果,列直到 _31,其中 1 个公司(或 id)的最大观察值为 7。

Any ideas?有任何想法吗? I need it in order to use matching!我需要它才能使用匹配!

Thank you谢谢

You can just use the reshape() function in base R.您可以在基础 R 中使用reshape()函数。

reshape(d, direction = "wide", timevar = "year", idvar = c("id", "company"))

There will be NA s for any years that the firm doesn't have data.公司没有数据的任何年份都会有NA Include any time-fixed variables (eg, country or strategy, if measured) in idvar .idvar包括任何时间固定的变量(例如,国家或战略,如果测量的话)。

I think you can skip creation of row column all together.我认为您可以一起跳过row创建。

tidyr::pivot_wider(df, names_from = year, 
                  values_from = c(employees, profit_margin, RD_expenses, RD_intensity, sales, treated)) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM