简体   繁体   English

如何在组拆分产生的矩阵列表中重新绑定、排列和格式化数据

[英]How to rbind, arrange and format data in a list of matrices resulting from a group split

I have a list of matrices showing the results of a descriptive analysis resulting from a previous group_split() by a factor.我有一个matrices list ,显示了从前一个group_split()得到的描述性分析的结果。

What I'd like to do is stacking corresponding matrices using rbind() with the help of a functional solution that allows for an iterating choice of corresponding matrices , rbinding and formatting them (ie setting rownames, colnames, and individual order of rows).我想做的是在函数解决方案的帮助下使用rbind()堆叠相应的matrices ,该解决方案允许迭代选择相应的matrices 、rbinding 和格式化它们(即设置行名、列名和单独的行顺序)。 The final step is to print the matrices containing the descriptive results using kableExtra .最后一步是使用kableExtra打印包含描述性结果的matrices

My problem: Using rbind() within a for loop to bind and iterate over the corresponding matrix triplets to rbind them only produces the desired output for the last triplet, but not for all triplets.我的问题:在 for 循环中使用rbind()来绑定和迭代相应的矩阵三元组以对它们进行rbind只为最后一个三元组生成所需的 output,但不是为所有三元组生成所需的 output。 Maybe someone of you has an idea of where I'm going wrong.也许你们中的某个人知道我哪里出错了。 I have consulted similar questions here but have not found any solution to my problem.我在这里咨询过类似的问题,但没有找到任何解决我问题的方法。

Here is an example using a tidyverse and kableExtra package environment这是使用tidyversekableExtra package 环境的示例

# Some random data for an initial df
city <- rep(c(1:3), each = 4) %>% factor () # this is the splitting variable
gender <- rep(c("m", "f", "m", "f", "m", "f", "m", "f", "m", "f", "m", "f")) %>% factor () # this is a factor for a later subgrouping analysis
age <- c(32, 54, 67, 35, 19, 84, 34, 46, 67, 41, 20, 75)
working_yrs <- c(16, 27, 39, 16, 2, 50, 16, 23, 48, 21, 0, 57)
income <- (working_yrs)*50

df <- data.frame(city, gender, age, working_yrs, income)

cities <- city %>% levels () %>% c () # vector needed later for a for loop


# Group splits by city (dfs -> list of lists)
df1 <- select(df, -gender) %>% 
  group_split (city, keep=FALSE)

df2 <- select (df, -income) %>%
  filter(str_detect(gender, "m")) %>% 
  select (city, age, working_yrs) %>%
  group_split (city, keep = FALSE)

df3 <- select (df, -income) %>%
  filter(str_detect(gender, "f")) %>% 
  select (city, age, working_yrs) %>%
  group_split (city, keep = FALSE)

LOL <- c(df1, df2, df3) # list of lists


# Define function for descriptive analysis (list of lists -> list of matrices)
fun_descr <- function(x) {
  c(n=sum(!is.na(x)),
    Percent=((sum(!is.na(x)))/(sum(!is.na(x)) + sum(is.na(x)))*100),
    Mean=mean(x, na.rm = TRUE),
    SD=sd(x, na.rm = TRUE),
    Median=median(x, na.rm = TRUE),
    Quantile=quantile(x, 0.25, na.rm = TRUE),
    Quantile=quantile(x, 0.75, na.rm = TRUE))
}

LOM <- lapply (LOL, function (x) {
  t(apply(x, 2, fun_descr)) %>% round(digits = 1)
})

So far so good, now here's the problem.到目前为止一切顺利,现在问题来了。 My approach to rbind() corresponding matrix triplets belonging to the same city returns proper results for the last city only.我对属于同一城市的rbind()对应矩阵三元组的方法仅返回最后一个城市的正确结果。


for (i in 1:length(cities)) {
  bindcity <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
}

bindcity 

If the for loop or an lapply solution worked correctly, returning a list of rbound matrices , I would expect to be formatting the rows and cols of the resulting list of matrices as follows.如果for循环或lapply解决方案正常工作,返回 rbound matrices列表,我希望将结果matrices list的行和列格式化如下。 Unfortunately, since the previous step doesn't work as expected, I couldn't test it, yet.不幸的是,由于上一步没有按预期工作,我还不能测试它。 I'm still struggling to find a first line for this function sorting each matrix's rows in the following row order 1,4,6,2,5,7,3 so that the data match the rownames shown below.我仍在努力为这个 function 找到第一行,按以下行顺序 1、4、6、2、5、7、3 对每个矩阵的行进行排序,以便数据与下面显示的行名匹配。

nicematrices <- lapply (bindcity, function (x) {
  rownames(x) <- paste(list("Age", "Working years", "Age (male)", "Working years (male)", "Age (female)", "Working years (female)", "Income"))
  colnames(x) <- paste(list("n (valid)", "% (valid)", "Mean", "SD", "Median", "25% Quantile", "75% Quantile"))
  return(x)
})

Final step: Print matrices using kableExtra最后一步:使用kableExtra打印matrices

for (i in 1:length(nicematrices)) {
print(
  kable(nicematrices[[i]], caption = "Title") %>%
    column_spec(1, bold = T) %>%
    kable_styling("striped", bootstrap_options = "hover", full_width = TRUE)
)}

I don't know if I understand correctly but have you tried adding your i index in the bindcity?我不知道我是否理解正确,但您是否尝试在 bindcity 中添加您的 i 索引?

for (i in 1:length(cities)) {
  bindcity[[i]] <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
}

What could be your problem here is that your loop indeed goes through all the iterations but saves only the last one if you don't make sure that for every i it saves the output.您的问题可能是您的循环确实经历了所有迭代,但如果您不能确保每个 i 都保存 output,则只保存最后一个迭代。 You will also need to initiate the bindcity before the loop if you are to follow this way.如果您要遵循这种方式,您还需要在循环之前启动 bindcity。 Overall:全面的:

bindcity <- c()

for (i in 1:length(cities)) {
  bindcity[[i]] <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
}

Here's what the above returns:以下是上述返回的内容:

> bindcity

[[1]]
            n Percent   Mean    SD Median Quantile.25% Quantile.75%
age         4     100   47.0  16.5   44.5         34.2         57.2
working_yrs 4     100   24.5  11.0   21.5         16.0         30.0
income      4     100 1225.0 548.5 1075.0        800.0       1500.0
age         2     100   49.5  24.7   49.5         40.8         58.2
working_yrs 2     100   27.5  16.3   27.5         21.8         33.2
age         2     100   44.5  13.4   44.5         39.8         49.2
working_yrs 2     100   21.5   7.8   21.5         18.8         24.2

[[2]]
            n Percent   Mean     SD Median Quantile.25% Quantile.75%
age         4     100   45.8   27.8   40.0         30.2         55.5
working_yrs 4     100   22.8   20.2   19.5         12.5         29.8
income      4     100 1137.5 1007.8  975.0        625.0       1487.5
age         2     100   26.5   10.6   26.5         22.8         30.2
working_yrs 2     100    9.0    9.9    9.0          5.5         12.5
age         2     100   65.0   26.9   65.0         55.5         74.5
working_yrs 2     100   36.5   19.1   36.5         29.8         43.2

[[3]]
            n Percent   Mean     SD Median Quantile.25% Quantile.75%
age         4     100   50.8   25.1   54.0         35.8         69.0
working_yrs 4     100   31.5   26.0   34.5         15.8         50.2
income      4     100 1575.0 1299.0 1725.0        787.5       2512.5
age         2     100   43.5   33.2   43.5         31.8         55.2
working_yrs 2     100   24.0   33.9   24.0         12.0         36.0
age         2     100   58.0   24.0   58.0         49.5         66.5
working_yrs 2     100   39.0   25.5   39.0         30.0         48.0

The following uses lapply loops to get the desired binded matrices and the Kable output.下面使用lapply循环来获得所需的绑定矩阵和 Kable output。

bindcity <- lapply(seq_along(cities), function(i){
  rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
})

nicematrices <- lapply(bindcity, function (x) {
  rownames(x) <- c("Age", "Working years", "Income", "Age (male)", "Working years (male)", "Age (female)", "Working years (female)")
  colnames(x) <- c("n (valid)", "% (valid)", "Mean", "SD", "Median", "25% Quantile", "75% Quantile")
  x
})

The two loops above can be simplified.上面的两个循环可以简化。 However, the following lapply loop will not create the bindcity list.但是,以下lapply循环不会创建bindcity列表。 This is only important if this list is used after, which is not clear from the question.这仅在之后使用此列表时才重要,这在问题中并不清楚。 It is not used to create the Kable tables.用于创建 Kable 表。

nicematrices <- lapply(seq_along(cities), function (i) {
  x <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
  rownames(x) <- c("Age", "Working years", "Income", "Age (male)", "Working years (male)", "Age (female)", "Working years (female)")
  colnames(x) <- c("n (valid)", "% (valid)", "Mean", "SD", "Median", "25% Quantile", "75% Quantile")
  x
})

Now for the Kable tables.现在为 Kable 表。

library(kableExtra)

kbl_list <- lapply(nicematrices, function(x){
  kbl <- kable(x, caption = "Title") %>%
    column_spec(1, bold = TRUE) %>%
    kable_styling("striped", 
                  bootstrap_options = "hover",
                  full_width = TRUE)
  print(kbl)
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM