简体   繁体   English

如何在 R 中将 10 列收集到一个列中,将其他 10 列收集到另一个列中,计数和频率仅使用 tidyverse

[英]How to gather 10 columns into a column and other 10 columns into another colum, with counts and frequency with tidyverse only, in R

I am having a trouble in doing double gathering on several column that are related to Comorbidities, and other columns that are related to Symptoms.我在与合并症相关的几列以及与症状相关的其他列上进行双重收集时遇到了麻烦。 The purpose is to get a count and frequency per groups of comorbidities and symptoms.目的是获得每组合并症和症状的计数和频率。

This is the type of data I have.这是我拥有的数据类型。

 test <- structure(
  list(
    ID = c("1",
           "2", "3",
           "4", "5",
           "6"),
    Chills = c("No", "Mild", "No", "Mild", "No", "No"),
    Cough = c("No", "Severe", "No", "Mild", "Mild", "No"),
    Diarrhoea = c("No", "Mild", "No", "No", "No", "No"),
    Fatigue = c("No", "Moderate", "Mild", "Mild", "Mild", "Mild"),
    Headcahe = c("No", "No", "No", "Mild", "No", "No"),
    `Loss of smell and taste` = c("No", "No", "No", "No", "No", "No"),
    `Muscle Ache` = c("No", "Moderate", "No", "Moderate", "Mild", "Mild"),
    `Nasal Congestion` = c("No", "No", "No", "No", "Mild", "No"),
    `Nausea and Vomiting` = c("No", "No",
                              "No", "No", "No", "No"),
    `Shortness of Breath` = c("No",
                              "Mild", "No", "No", "No", "Mild"),
    `Sore Throat` = c("No",
                      "No", "No", "No", "Mild", "No"),
    Sputum = c("No", "Mild",
               "No", "Mild", "Mild", "No"),
    Temperature = c("No", "No",
                    "No", "No", "No", "37.5-38"),
    Comorbidity_one = c(
      "Asthma (managed with an inhaler)",
      "None",
      "Obesity",
      "High Blood Pressure (hypertension)",
      "None",
      "None"
    ),
    Comorbidity_two = c("Diabetes Type 2", NA,
                        NA, "Obesity", NA, NA),
    Comorbidity_three = c(
      "Asthma (managed with an inhaler)",
      "None",
      "Obesity",
      "High Blood Pressure (hypertension)",
      "None",
      NA_character_
    ),
    Comorbidity_four = c(
      "Asthma (managed with an inhaler)",
      "None",
      "High Blood Pressure (hypertension)",
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_five = c(
      "Asthma (managed with an inhaler)",
      "None",
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_six = c(
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_seven = c(
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_eight = c(
      "High Blood Pressure (hypertension)",
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_,
      NA_character_
    ),
    Comorbidity_nine = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "High Blood Pressure (hypertension)",
      NA_character_,
      "High Blood Pressure (hypertension)"
    )
  ),
  row.names = c(NA,-6L),
  class = c("tbl_df",
            "tbl", "data.frame")
)

But the outputted object, just a sample, should look like this:但是输出的 object 只是一个示例,应该如下所示:

 structure(list(Comorbidities = c("Asthma", "Asthma", "Asthma", 
"Diabetes", "Diabetes", "Diabetes", "High blood Pressure", "High blood Pressure", 
"High blood Pressure"), Symptoms = c("Cough", "Cough", "Loss of smell and taste", 
"Cough", "Chills mild", "Loss of smell and taste", "Cough", "Chills", 
"Loss of smell and taste"), Group = c("Mild", "Moderate", "Severe", 
"Mild", "Moderate", "Severe", "Mild", "Moderate", "Severe"), 
    Count = c(112, 10, 10, 123, 132, 153, 897, 98, 10), Percentage = c(0.23, 
    0.3, 0.1, 0.6, 0.5, 0.3, 0.8, 0.9, 0.5)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

I want to achieve this only with tidyverse, in R.我只想在 R 中使用 tidyverse 来实现这一点。

Maybe this is what you're after.也许这就是你所追求的。 I pivot longer separately for symptoms first and then co-morbidities, omitting records if they have none. I pivot 分别针对症状和合并症,如果没有则省略记录。 Percentages are the number of symptoms among each morbidity.百分比是每种发病率中的症状数量。 If that's not what you want, you can easily change it.如果这不是您想要的,您可以轻松更改它。

library(tidyr); library(dplyr)

pivot_longer(test, cols=2:14, names_to="symptom", values_to="severity") %>%
  filter(severity!="No") %>%
  pivot_longer(cols=starts_with("Comorbidity"), values_to="morbidity") %>%
  filter(morbidity != "None") %>%
  group_by(morbidity, symptom, severity) %>%
  summarise(Count=n()) %>%
  group_by(morbidity) %>%
  mutate(Percentage=Count/sum(Count))
___
# A tibble: 15 x 5
# Groups:   morbidity [2]
   morbidity                          symptom             severity Count Percentage
   <chr>                              <chr>               <chr>    <int>      <dbl>
 1 High Blood Pressure (hypertension) Chills              Mild         3     0.130 
 2 High Blood Pressure (hypertension) Cough               Mild         3     0.130 
 3 High Blood Pressure (hypertension) Fatigue             Mild         5     0.217 
 4 High Blood Pressure (hypertension) Headcahe            Mild         3     0.130 
 5 High Blood Pressure (hypertension) Muscle Ache         Mild         1     0.0435
 6 High Blood Pressure (hypertension) Muscle Ache         Moderate     3     0.130 
 7 High Blood Pressure (hypertension) Shortness of Breath Mild         1     0.0435
 8 High Blood Pressure (hypertension) Sputum              Mild         3     0.130 
 9 High Blood Pressure (hypertension) Temperature         37.5-38      1     0.0435
10 Obesity                            Chills              Mild         1     0.125 
11 Obesity                            Cough               Mild         1     0.125 
12 Obesity                            Fatigue             Mild         3     0.375 
13 Obesity                            Headcahe            Mild         1     0.125 
14 Obesity                            Muscle Ache         Moderate     1     0.125 
15 Obesity                            Sputum              Mild         1     0.125 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 tidyverse 收集多列 - tidyverse gather multiple columns 在 R 中,我如何可视化两列内的频率计数与另一个基线列在时间上的比较? - In R, how can I visualize how the frequency counts within two columns compares to another baseline column across time? 获取按另一列分组的多列的频率计数 - Getting frequency counts for multipe columns grouped BY another column 如何将一列中的一行与 R 中其他两列中的下 5-10 行匹配? - How to match one row from one column to the next 5-10 rows in two other columns in R? 根据其他列的值在 R tidyverse 中构建一个新列 - Building a new column in R tidyverse based on values of other columns 为什么我没有得到仅由 tidyverse 的其他分类变量分组的两个数字列的计数? - why I do not get counts over two numerical columns grouped by other categorical vars with tidyverse only? 如何将其他列中的字符收集到一列中? - how to gather character from other columns into one column? R - 如何找到多列的前 10%? - R - How to find top 10% of multiple columns? 如何根据R中列名的第一部分收集列? - How to gather columns based on the first part of the column name in R? 使用 R 中其他列的频率和出现时间选择列的重复项 - Selecting duplicates of a column using the frequency and occurring time of other columns in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM