如何在 R 中将 10 列收集到一个列中，将其他 10 列收集到另一个列中，计数和频率仅使用 tidyverse

Question

I am having a trouble in doing double gathering on several column that are related to Comorbidities, and other columns that are related to Symptoms.我在与合并症相关的几列以及与症状相关的其他列上进行双重收集时遇到了麻烦。 The purpose is to get a count and frequency per groups of comorbidities and symptoms.目的是获得每组合并症和症状的计数和频率。

This is the type of data I have.这是我拥有的数据类型。

 test <- structure(
  list(
    ID = c("1",
           "2", "3",
           "4", "5",
           "6"),
    Chills = c("No", "Mild", "No", "Mild", "No", "No"),
    Cough = c("No", "Severe", "No", "Mild", "Mild", "No"),
    Diarrhoea = c("No", "Mild", "No", "No", "No", "No"),
    Fatigue = c("No", "Moderate", "Mild", "Mild", "Mild", "Mild"),
    Headcahe = c("No", "No", "No", "Mild", "No", "No"),
    `Loss of smell and taste` = c("No", "No", "No", "No", "No", "No"),
    `Muscle Ache` = c("No", "Moderate", "No", "Moderate", "Mild", "Mild"),
    `Nasal Congestion` = c("No", "No", "No", "No", "Mild", "No"),
    `Nausea and Vomiting` = c("No", "No",
                              "No", "No", "No", "No"),
    `Shortness of Breath` = c("No",
                              "Mild", "No", "No", "No", "Mild"),
    `Sore Throat` = c("No",
                      "No", "No", "No", "Mild", "No"),
    Sputum = c("No", "Mild",
               "No", "Mild", "Mild", "No"),
    Temperature = c("No", "No",
                    "No", "No", "No", "37.5-38"),
    Comorbidity_one = c(
      "Asthma (managed with an inhaler)",
      "None",
      "Obesity",
      "High Blood Pressure (hypertension)",
      "None",
      "None"
    ),
    Comorbidity_two = c("Diabetes Type 2", NA,
                        NA, "Obesity", NA, NA),
    Comorbidity_three = c(
      "Asthma (managed with an inhaler)",
      "None",
      "Obesity",
      "High Blood Pressure (hypertension)",
      "None",
      NA_character_
    ),
    Comorbidity_four = c(
      "Asthma (managed with an inhaler)",
      "None",
      "High Blood Pressure (hypertension)",
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_five = c(
      "Asthma (managed with an inhaler)",
      "None",
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_six = c(
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_seven = c(
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_eight = c(
      "High Blood Pressure (hypertension)",
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_nine = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "High Blood Pressure (hypertension)",
      NA_character_,
      "High Blood Pressure (hypertension)"
    )
  ),
  row.names = c(NA,-6L),
  class = c("tbl_df",
            "tbl", "data.frame")
)

But the outputted object, just a sample, should look like this:但是输出的 object 只是一个示例，应该如下所示：

 structure(list(Comorbidities = c("Asthma", "Asthma", "Asthma", 
"Diabetes", "Diabetes", "Diabetes", "High blood Pressure", "High blood Pressure", 
"High blood Pressure"), Symptoms = c("Cough", "Cough", "Loss of smell and taste", 
"Cough", "Chills mild", "Loss of smell and taste", "Cough", "Chills", 
"Loss of smell and taste"), Group = c("Mild", "Moderate", "Severe", 
"Mild", "Moderate", "Severe", "Mild", "Moderate", "Severe"), 
    Count = c(112, 10, 10, 123, 132, 153, 897, 98, 10), Percentage = c(0.23, 
    0.3, 0.1, 0.6, 0.5, 0.3, 0.8, 0.9, 0.5)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

I want to achieve this only with tidyverse, in R.我只想在 R 中使用 tidyverse 来实现这一点。

Answer 1

Maybe this is what you're after.也许这就是你所追求的。 I pivot longer separately for symptoms first and then co-morbidities, omitting records if they have none. I pivot 分别针对症状和合并症，如果没有则省略记录。 Percentages are the number of symptoms among each morbidity.百分比是每种发病率中的症状数量。 If that's not what you want, you can easily change it.如果这不是您想要的，您可以轻松更改它。

library(tidyr); library(dplyr)

pivot_longer(test, cols=2:14, names_to="symptom", values_to="severity") %>%
  filter(severity!="No") %>%
  pivot_longer(cols=starts_with("Comorbidity"), values_to="morbidity") %>%
  filter(morbidity != "None") %>%
  group_by(morbidity, symptom, severity) %>%
  summarise(Count=n()) %>%
  group_by(morbidity) %>%
  mutate(Percentage=Count/sum(Count))
___
# A tibble: 15 x 5
# Groups:   morbidity [2]
   morbidity                          symptom             severity Count Percentage
   <chr>                              <chr>               <chr>    <int>      <dbl>
 1 High Blood Pressure (hypertension) Chills              Mild         3     0.130 
 2 High Blood Pressure (hypertension) Cough               Mild         3     0.130 
 3 High Blood Pressure (hypertension) Fatigue             Mild         5     0.217 
 4 High Blood Pressure (hypertension) Headcahe            Mild         3     0.130 
 5 High Blood Pressure (hypertension) Muscle Ache         Mild         1     0.0435
 6 High Blood Pressure (hypertension) Muscle Ache         Moderate     3     0.130 
 7 High Blood Pressure (hypertension) Shortness of Breath Mild         1     0.0435
 8 High Blood Pressure (hypertension) Sputum              Mild         3     0.130 
 9 High Blood Pressure (hypertension) Temperature         37.5-38      1     0.0435
10 Obesity                            Chills              Mild         1     0.125 
11 Obesity                            Cough               Mild         1     0.125 
12 Obesity                            Fatigue             Mild         3     0.375 
13 Obesity                            Headcahe            Mild         1     0.125 
14 Obesity                            Muscle Ache         Moderate     1     0.125 
15 Obesity                            Sputum              Mild         1     0.125

如何在 R 中将 10 列收集到一个列中，将其他 10 列收集到另一个列中，计数和频率仅使用 tidyverse

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-21 18:07:07

如何在 R 中将 10 列收集到一个列中，将其他 10 列收集到另一个列中，计数和频率仅使用 tidyverse

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-21 18:07:07

解决方案1
2 已采纳 2020-05-21 18:07:07