简体   繁体   中英

Unable to construct choropleth map in r

I have some demographic data I would like to use to make a choropleth map of US counties. My workflow doesn't run into any errors and I'm able to create the final map, however, the data that its mapping is incorrect. My workflow makes use of two data sources -a shape file and a data.frame. The shapefile is a counties shapefile that can be found at this link https://www.dropbox.com/s/4ujxidyx42793j7/cb_2015_us_county_500k.zip?dl=1 The data.frame file can be found at this link: https://www.dropbox.com/s/qys6s6ikrs1g2xb/data.dem.csv?dl=1

Here is my code:

#Load dependencies
library(sp)
library(spatialEco)
library(rgdal)
library(dplyr)
library(maptools)
library(taRifx.geo)
library(ggplot2)
library(USAboundaries)
library(splitstackshape)
library(maps)
library(cowplot)

#Read in shape and csv files
county.track<-readOGR("/path", "filename")
county.track@data$id = rownames(county.track@data)
data<-read.csv("/path/filename.csv")

#Convert data.frame (data) to points polygon file
data$y<-data$lat
data$x<-data$long
coordinates(data) <- ~ x + y
proj4string(data) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
proj4string(county.track) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

#Overlay points onto polygons
county.track.data<-point.in.poly(data, county.track)

#Summarize point data by county
count<-select(as.data.frame(county.track.data), id, count)
count<-count %>%
  group_by(id) %>%
  summarize(count=sum(count))

#Merge with shape file data
county.track@data<-merge(county.track@data, count, by="id", all.x=T)

#Replace NA values with zeroes 
county.track@data$count[is.na(county.track@data$count)]<-0
county.track.points = fortify(county.track, region="id")
map.plot<-merge(county.track.points, county.track@data, by="id")

#Get rid of Hawaii and Alaska
map.plot<-map.plot %>%
  filter(lat<50 & lat>25) %>%
  filter(long>-130)

#Create choropleth map using ggplot2
 ggplot(map.plot) +
  geom_polygon(aes(long, lat, group=group, fill=log(count))) +
  coord_map()

The output looks like the following: 在此输入图像描述

But this is just wrong, which is apparent for a number of reasons. One, most obviously much of the data is not mapped. The grey areas on the map signify NA. But I removed the NAs in one of the steps above, also when examining the data used to map (map.plot), there are no NAs in the fill variable (count). Second, the distribution of values for what is mapped is off. Los Angeles county should have the highest count value at 793 (log value of 6.675823), yet on the map numerous lighter colored counties indicate the value of other spatial units are higher and some of the top ranking counties such as San Diego, are not filled in at all (bottom left of map).

When I examine the data I used to map (map.plot), everything seems A OK. Los Angeles county is still the highest valued county for the "count" variable, yet the map suggests otherwise (see this image here). 在此输入图像描述 I'm hoping some one can do some forensics here and identify the problem, I've done my best to go through all my steps but I can't seem to identify the issue. Thanks in advance.

UPDATE: I tried using a different shapefile from the same source. The shapefile in the link above is the same as the the one labeled "cb_2015_us_county_500k.zip" at the following ( https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html ). When I choose a different shapefile (such as cb_2015_us_county_5m.zip) I get a different map but same problems: See the following map an example:

在此输入图像描述

I'm not sure what is going on! In this new map, LA county is no longer even colored in but Orange County is! Any help is much appreciated.

Not rly sure what's going on with your merging, but this worked for me:

library(albersusa) # devtools::install_github("hrbrmstr/albersusa)
library(readr)
library(dplyr)
library(rgeos)
library(maptools)
library(ggplot2)
library(ggalt)
library(ggthemes)
library(viridis)

df <- read_csv("data.dem.csv")

counties_composite() %>% 
  subset(state %in% unique(df$state)) -> usa

pts <- df[,2:1]
coordinates(pts) <- ~long+lat
proj4string(pts) <- CRS(proj4string(usa))

bind_cols(df, select(over(pts, usa), -state)) %>% 
  count(fips, wt=count) -> df

You have 942 total counties:

glimpse(df)
## Observations: 942
## Variables: 2
## $ fips <chr> "01001", "01003", "01013", "01015", "01043", "01055", "01061", ...
## $ n    <int> 1, 2, 1, 3, 1, 3, 1, 1, 19, 6, 12, 7, 7, 1, 4, 4, 1, 5, 67, 19,...

There are over 3K counties in the US

However, there aren't many NA s:

filter(df, is.na(fips))
## # A tibble: 1 x 2
##    fips     n
#3   <chr> <int>
## 1  <NA>    10

usa_map <- fortify(usa, region="fips")

gg <- ggplot()
gg <- gg + geom_map(data=usa_map, map=usa_map,
                    aes(long, lat, map_id=id),
                    color="#b2b2b2", size=0.05, fill="white")
gg <- gg + geom_map(data=df, map=usa_map,
                    aes(fill=n, map_id=fips),
                    color="#b2b2b2", size=0.05)
gg <- gg + scale_fill_viridis(name="Count", trans="log10")
gg <- gg + coord_proj(us_aeqd_proj)
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.85, 0.2))
gg

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM