简体   繁体   中英

Trying to categorize and condense data in R with a key word

df <- read.csv("https://query.data.world/s/gzjmftivszsy44ukfak2e7ksig35jm", header=TRUE, stringsAsFactors=FALSE);
library(ggplot2)
library(qqplotr)
library(stats)
library(dplyr)



coverage_by_Geography = data.frame(avgcancerdiag= df$avgAnnCount, county = df$Geography, PubCoverage = df$PctPublicCoverage, privcoverage = df$PctPrivateCoverage, deathrt = df$avgDeathsPerYear)
ggplot(data = coverage_by_Geography, aes(x = privcoverage, y = deathrt))+geom_col()
ggplot(data = coverage_by_Geography, aes(x = PubCoverage, y = deathrt))+geom_col()

I am trying to take a bunch of county's within a column, condense them into states and average their data out to state numbers instead of county. Am stumped on how to do it.

A general tidyverse solution follows:

library(tidyverse)

df <- read_csv("https://query.data.world/s/gzjmftivszsy44ukfak2e7ksig35jm")

df %>%
  separate(Geography, c("county", "state"), ", ") %>% 
  select(state, county, everything()) %>% 
  group_by(state) %>% 
  summarize(across(-c(county), mean))

The code separates county and states into two columns. Grouping by state allows you to summarize the data. Here, I asked for the mean of all of the columns, but this probably doesn't make sense for all of the different data types. Hopefully this gets you closer to what you are looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM