简体   繁体   中英

How to rbind, arrange and format data in a list of matrices resulting from a group split

I have a list of matrices showing the results of a descriptive analysis resulting from a previous group_split() by a factor.

What I'd like to do is stacking corresponding matrices using rbind() with the help of a functional solution that allows for an iterating choice of corresponding matrices , rbinding and formatting them (ie setting rownames, colnames, and individual order of rows). The final step is to print the matrices containing the descriptive results using kableExtra .

My problem: Using rbind() within a for loop to bind and iterate over the corresponding matrix triplets to rbind them only produces the desired output for the last triplet, but not for all triplets. Maybe someone of you has an idea of where I'm going wrong. I have consulted similar questions here but have not found any solution to my problem.

Here is an example using a tidyverse and kableExtra package environment

# Some random data for an initial df
city <- rep(c(1:3), each = 4) %>% factor () # this is the splitting variable
gender <- rep(c("m", "f", "m", "f", "m", "f", "m", "f", "m", "f", "m", "f")) %>% factor () # this is a factor for a later subgrouping analysis
age <- c(32, 54, 67, 35, 19, 84, 34, 46, 67, 41, 20, 75)
working_yrs <- c(16, 27, 39, 16, 2, 50, 16, 23, 48, 21, 0, 57)
income <- (working_yrs)*50

df <- data.frame(city, gender, age, working_yrs, income)

cities <- city %>% levels () %>% c () # vector needed later for a for loop


# Group splits by city (dfs -> list of lists)
df1 <- select(df, -gender) %>% 
  group_split (city, keep=FALSE)

df2 <- select (df, -income) %>%
  filter(str_detect(gender, "m")) %>% 
  select (city, age, working_yrs) %>%
  group_split (city, keep = FALSE)

df3 <- select (df, -income) %>%
  filter(str_detect(gender, "f")) %>% 
  select (city, age, working_yrs) %>%
  group_split (city, keep = FALSE)

LOL <- c(df1, df2, df3) # list of lists


# Define function for descriptive analysis (list of lists -> list of matrices)
fun_descr <- function(x) {
  c(n=sum(!is.na(x)),
    Percent=((sum(!is.na(x)))/(sum(!is.na(x)) + sum(is.na(x)))*100),
    Mean=mean(x, na.rm = TRUE),
    SD=sd(x, na.rm = TRUE),
    Median=median(x, na.rm = TRUE),
    Quantile=quantile(x, 0.25, na.rm = TRUE),
    Quantile=quantile(x, 0.75, na.rm = TRUE))
}

LOM <- lapply (LOL, function (x) {
  t(apply(x, 2, fun_descr)) %>% round(digits = 1)
})

So far so good, now here's the problem. My approach to rbind() corresponding matrix triplets belonging to the same city returns proper results for the last city only.


for (i in 1:length(cities)) {
  bindcity <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
}

bindcity 

If the for loop or an lapply solution worked correctly, returning a list of rbound matrices , I would expect to be formatting the rows and cols of the resulting list of matrices as follows. Unfortunately, since the previous step doesn't work as expected, I couldn't test it, yet. I'm still struggling to find a first line for this function sorting each matrix's rows in the following row order 1,4,6,2,5,7,3 so that the data match the rownames shown below.

nicematrices <- lapply (bindcity, function (x) {
  rownames(x) <- paste(list("Age", "Working years", "Age (male)", "Working years (male)", "Age (female)", "Working years (female)", "Income"))
  colnames(x) <- paste(list("n (valid)", "% (valid)", "Mean", "SD", "Median", "25% Quantile", "75% Quantile"))
  return(x)
})

Final step: Print matrices using kableExtra

for (i in 1:length(nicematrices)) {
print(
  kable(nicematrices[[i]], caption = "Title") %>%
    column_spec(1, bold = T) %>%
    kable_styling("striped", bootstrap_options = "hover", full_width = TRUE)
)}

I don't know if I understand correctly but have you tried adding your i index in the bindcity?

for (i in 1:length(cities)) {
  bindcity[[i]] <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
}

What could be your problem here is that your loop indeed goes through all the iterations but saves only the last one if you don't make sure that for every i it saves the output. You will also need to initiate the bindcity before the loop if you are to follow this way. Overall:

bindcity <- c()

for (i in 1:length(cities)) {
  bindcity[[i]] <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
}

Here's what the above returns:

> bindcity

[[1]]
            n Percent   Mean    SD Median Quantile.25% Quantile.75%
age         4     100   47.0  16.5   44.5         34.2         57.2
working_yrs 4     100   24.5  11.0   21.5         16.0         30.0
income      4     100 1225.0 548.5 1075.0        800.0       1500.0
age         2     100   49.5  24.7   49.5         40.8         58.2
working_yrs 2     100   27.5  16.3   27.5         21.8         33.2
age         2     100   44.5  13.4   44.5         39.8         49.2
working_yrs 2     100   21.5   7.8   21.5         18.8         24.2

[[2]]
            n Percent   Mean     SD Median Quantile.25% Quantile.75%
age         4     100   45.8   27.8   40.0         30.2         55.5
working_yrs 4     100   22.8   20.2   19.5         12.5         29.8
income      4     100 1137.5 1007.8  975.0        625.0       1487.5
age         2     100   26.5   10.6   26.5         22.8         30.2
working_yrs 2     100    9.0    9.9    9.0          5.5         12.5
age         2     100   65.0   26.9   65.0         55.5         74.5
working_yrs 2     100   36.5   19.1   36.5         29.8         43.2

[[3]]
            n Percent   Mean     SD Median Quantile.25% Quantile.75%
age         4     100   50.8   25.1   54.0         35.8         69.0
working_yrs 4     100   31.5   26.0   34.5         15.8         50.2
income      4     100 1575.0 1299.0 1725.0        787.5       2512.5
age         2     100   43.5   33.2   43.5         31.8         55.2
working_yrs 2     100   24.0   33.9   24.0         12.0         36.0
age         2     100   58.0   24.0   58.0         49.5         66.5
working_yrs 2     100   39.0   25.5   39.0         30.0         48.0

The following uses lapply loops to get the desired binded matrices and the Kable output.

bindcity <- lapply(seq_along(cities), function(i){
  rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
})

nicematrices <- lapply(bindcity, function (x) {
  rownames(x) <- c("Age", "Working years", "Income", "Age (male)", "Working years (male)", "Age (female)", "Working years (female)")
  colnames(x) <- c("n (valid)", "% (valid)", "Mean", "SD", "Median", "25% Quantile", "75% Quantile")
  x
})

The two loops above can be simplified. However, the following lapply loop will not create the bindcity list. This is only important if this list is used after, which is not clear from the question. It is not used to create the Kable tables.

nicematrices <- lapply(seq_along(cities), function (i) {
  x <- rbind(LOM[[i]], LOM[[i+length(cities)]], LOM[[i+(length(cities)*2)]])
  rownames(x) <- c("Age", "Working years", "Income", "Age (male)", "Working years (male)", "Age (female)", "Working years (female)")
  colnames(x) <- c("n (valid)", "% (valid)", "Mean", "SD", "Median", "25% Quantile", "75% Quantile")
  x
})

Now for the Kable tables.

library(kableExtra)

kbl_list <- lapply(nicematrices, function(x){
  kbl <- kable(x, caption = "Title") %>%
    column_spec(1, bold = TRUE) %>%
    kable_styling("striped", 
                  bootstrap_options = "hover",
                  full_width = TRUE)
  print(kbl)
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM