简体   繁体   English

地图,ggplot2,按州填写缺少地图上的某些区域

[英]Maps, ggplot2, fill by state is missing certain areas on the map

I am working with maps and ggplot2 to visualize the number of certain crimes in each state for different years. 我正在使用mapsggplot2来显示不同年份每个州的某些犯罪的数量。 The data set that I am working with was produced by the FBI and can be downloaded from their site or from here (if you don't want to download the dataset I don't blame you, but it is way too big to copy and paste into this question, and including a fraction of the data set wouldn't help, as there wouldn't be enough information to recreate the graph). 我正在使用的数据集是由FBI制作的,可以从他们的网站或从这里下载(如果你不想下载数据集,我不会责怪你,但它太大了,不能复制和粘贴到这个问题,并包括一小部分数据集将无济于事,因为没有足够的信息来重新创建图表)。

The problem is easier seen than described. 问题比描述的更容易看到。

国家抢劫

As you can see California is missing a large chunk as well as a few other states. 正如你所看到的,加利福尼亚州缺少一大块以及其他一些州。 Here is the code that produced this plot: 以下是生成此图的代码:

# load libraries
library(maps)
library(ggplot2)

# load data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
states <- map_data("state")

# merge data sets by region
fbi$region <- tolower(fbi$state)
fbimap <- merge(fbi, states, by="region")

# plot robbery numbers by state for year 2012
fbimap12 <- subset(fbimap, Year == 2012)
qplot(long, lat, geom="polygon", data=fbimap12,
  facets=~Year, fill=Robbery, group=group)

This is what the states data looks like: 这是states数据的样子:

    long      lat     group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>

And this is what the fbi data looks like: 这就是fbi数据的样子:

    Year Population Violent Property Murder Forcible.Rape Robbery
1 1960    3266740    6097    33823    406           281     898
2 1961    3302000    5564    32541    427           252     630
3 1962    3358000    5283    35829    316           218     754
4 1963    3347000    6115    38521    340           192     828
5 1964    3407000    7260    46290    316           397     992
6 1965    3462000    6916    48215    395           367     992
   Aggravated.Assault Burglary Larceny.Theft Vehicle.Theft abbr   state region
1               4512    11626         19344          2853   AL Alabama  alabama
2               4255    11205         18801          2535   AL Alabama  alabama
3               3995    11722         21306          2801   AL Alabama  alabama
4               4755    12614         22874          3033   AL Alabama  alabama
5               5555    15898         26713          3679   AL Alabama  alabama
6               5162    16398         28115          3702   AL Alabama  alabama

I then merged the two sets along region . 然后我沿着region合并了两套。 The subset I am trying to plot is 我试图绘制的子集是

      region Year Robbery      long      lat group
8283 alabama 2012    5020 -87.46201 30.38968     1
8284 alabama 2012    5020 -87.48493 30.37249     1
8285 alabama 2012    5020 -87.95475 30.24644     1
8286 alabama 2012    5020 -88.00632 30.24071     1
8287 alabama 2012    5020 -88.01778 30.25217     1
8288 alabama 2012    5020 -87.52503 30.37249     1
       ...            ...    ...      ...

Any ideas on how I can create this plot without those ugly missing spots? 关于如何在没有那些丑陋的缺失点的情况下创建这个情节的任何想法?

I played with your code. 我玩了你的代码。 One thing I can tell is that when you used merge something happened. 我能说的一件事是,当你使用merge发生的事情。 I drew states map using geom_path and confirmed that there were a couple of weird lines which do not exist in the original map data. 我使用geom_path绘制状态图并确认原始地图数据中不存在一些奇怪的线。 I, then, further investigated this case by playing with merge and inner_join . 然后,我通过使用mergeinner_join进一步调查了这个案例。 merge and inner_join are doing the same job here. mergeinner_join在这里做同样的工作。 However, I found a difference. 但是,我发现了一个区别。 When I used merge , order changed; 当我使用merge ,订单改变了; the numbers were not in the right sequence. 数字不是正确的顺序。 This was not the case with inner_join . inner_join不是这种情况。 You will see a bit of data with California below. 您将在下面看到加利福尼亚州的一些数据。 Your approach was right. 你的方法是对的。 But merge somehow did not work in your favour. merge不知何故对你有利。 I am not sure why the function changed order, though. 不过,我不确定为什么函数改变了顺序。

library(dplyr)

### Call US map polygon
states <- map_data("state")

### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)


### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.

setdiff(fbi$state, states$region)
#[1] "alaska"           "hawaii"           "washington d. c."

setdiff(states$region, fbi$state)
#[1] "district of columbia"

### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
        filter(Year == 2012) %>%
        select(state, Robbery)  

Now I created two data frames with merge and inner_join . 现在我使用mergeinner_join创建了两个数据框。

### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))

ana %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -119.8685 38.90956     4   676      <NA>
#2  california   56521 -119.5706 38.69757     4   677      <NA>
#3  california   56521 -119.3299 38.53141     4   678      <NA>
#4  california   56521 -120.0060 42.00927     4   667      <NA>
#5  california   56521 -120.0060 41.20139     4   668      <NA>

bob %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -120.0060 42.00927     4   667      <NA>
#2  california   56521 -120.0060 41.20139     4   668      <NA>
#3  california   56521 -120.0060 39.70024     4   669      <NA>
#4  california   56521 -119.9946 39.44241     4   670      <NA>
#5  california   56521 -120.0060 39.31636     4   671      <NA>

ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()

在此输入图像描述

The problem is in the order of arguments to merge 问题在于合并参数的顺序

fbimap <- merge(fbi, states, by="region")

has the thematic data first and the geo data second. 首先是主题数据,然后是地理数据。 Switching the order with 切换订单

fbimap <- merge(states, fbi, by="region")

the polygons should all close up. 多边形应该全部关闭。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM