简体   繁体   English

无法在 R 中创建 India choropleth

[英]Unable to create India choropleth in R

I want to create India's choropleth in R我想在 R 中创建印度的 choropleth

The first step I do is import a shape file in R我做的第一步是在 R 中导入一个形状文件

From https://github.com/datameet/maps/tree/master/States来自https://github.com/datameet/maps/tree/master/States

and read it in R并在 R 中阅读它

shape <- rgdal::readOGR(dsn="/Data/Admin2.shp")
states <- fortify(shape, region = "ST_NM")

Next I have a dataset of states and their population states_data接下来我有一个状态数据集及其人口states_data

structure(list(Name = c("JAMMU & KASHMIR", "HIMACHAL PRADESH", 
"UTTARAKHAND", "RAJASTHAN", "UTTAR PRADESH", "BIHAR", "SIKKIM", 
"ARUNACHAL PRADESH", "NAGALAND", "MANIPUR", "MIZORAM", "TRIPURA", 
"MEGHALAYA", "ASSAM", "WEST BENGAL", "JHARKHAND", "ODISHA", "CHHATTISGARH", 
"MADHYA PRADESH", "GUJARAT", "DAMAN & DIU", "DADRA & NAGAR HAVELI", 
"MAHARASHTRA", "ANDHRA PRADESH", "KARNATAKA", "GOA", "LAKSHADWEEP", 
"KERALA", "TAMIL NADU", "ANDAMAN & NICOBAR ISLANDS"), TOT_P = c(1493299, 
392126, 291903, 9238534, 1134273, 1336573, 206360, 951821, 1710973, 
1167422, 1036115, 1166813, 2555861, 3884371, 5296953, 8645042, 
9590756, 7822902, 15316784, 8917174, 15363, 178564, 10510213, 
5918073, 4248987, 149275, 61120, 484839, 794697, 28530)), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

I merge both dataset on state names我在州名上合并了两个数据集

final_data <- merge(states,states_data, by.y="Name", by.x="id")

Finally I plot using ggplot最后我使用 ggplot 绘图

ggplot()+
  geom_polygon(data=final_data,
               aes(x= long, y=lat, group=id, fill=TOT_P), color='black',size=0.25)+
  coord_map()

I get the following graph我得到以下图表

在此处输入图片说明

Can someone tell me where I am going wrong.有人可以告诉我我哪里出错了。 Any help is appreciated!任何帮助表示赞赏!

Thanks!谢谢!

The strings of the state names are not identical across your two datasets.两个数据集中的州名称字符串不相同。

If you take a look at the unique values, you can see that the shapefile uses title case如果您查看唯一值,您可以看到 shapefile 使用标题大小写

> unique(states$id)

[1] "Andaman & Nicobar Island" "Andhra Pradesh"           "Arunanchal Pradesh"       "Assam"                   
[5] "Bihar"                    "Chandigarh"               "Chhattisgarh"             "Dadara & Nagar Havelli"  
[9] "Daman & Diu"              "Goa"                      "Gujarat"                  "Haryana"                 
[13] "Himachal Pradesh"         "Jammu & Kashmir"          "Jharkhand"                "Karnataka"               
[17] "Kerala"                   "Lakshadweep"              "Madhya Pradesh"           "Maharashtra"             
[21] "Manipur"                  "Meghalaya"                "Mizoram"                  "Nagaland"                
[25] "NCT of Delhi"             "Odisha"                   "Puducherry"               "Punjab"                  
[29] "Rajasthan"                "Sikkim"                   "Tamil Nadu"               "Telangana"               
[33] "Tripura"                  "Uttar Pradesh"            "Uttarakhand"              "West Bengal"

while your population data frame uses all caps:而您的人口数据框使用全部大写:

> unique(states_data$Name)
[1] "JAMMU & KASHMIR"           "HIMACHAL PRADESH"          "UTTARAKHAND"               "RAJASTHAN"                
[5] "UTTAR PRADESH"             "BIHAR"                     "SIKKIM"                    "ARUNACHAL PRADESH"        
[9] "NAGALAND"                  "MANIPUR"                   "MIZORAM"                   "TRIPURA"                  
[13] "MEGHALAYA"                 "ASSAM"                     "WEST BENGAL"               "JHARKHAND"                
[17] "ODISHA"                    "CHHATTISGARH"              "MADHYA PRADESH"            "GUJARAT"                  
[21] "DAMAN & DIU"               "DADRA & NAGAR HAVELI"      "MAHARASHTRA"               "ANDHRA PRADESH"           
[25] "KARNATAKA"                 "GOA"                       "LAKSHADWEEP"               "KERALA"                   
[29] "TAMIL NADU"                "ANDAMAN & NICOBAR ISLANDS"

That's why your merged dataset final_data is empty.这就是为什么你的合并数据集final_data是空的。

One possible fix is to turn the names in both datasets into lower case before merging:一种可能的解决方法是在合并之前将两个数据集中的名称转换为小写:

states$id <- stringr::str_to_lower(states$id)
states_data$Name <- stringr::str_to_lower(states_data$Name)

However, there are still a few rows that will not be matched, either because of typos/different spellings or simply missing data.但是,仍然有几行无法匹配,原因可能是拼写错误/拼写不同,或者只是缺少数据。 You could take a look at those via你可以看看那些通过

setdiff(unique(states$id), unique(states_data$Name))

and where possible adapt the spelling.并在可能的情况下调整拼写。

Lastly, in my quick test the fortified polygons did not plot nicely -- this may entirely be specific to my combination of rgeos/rgdal/ggplot2.最后,在我的快速测试中,强化多边形没有很好地绘制——这可能完全特定于我的 rgeos/rgdal/ggplot2 组合。 Still, in case you intend to work with spatial data more extensively, I would like to point you to the sf package.不过,如果您打算更广泛地使用空间数据,我想向您指出sf包。 It makes handling spatial data extremely convenient (see the comprehensive documentation here ) and enables you to simply use geom_sf() for plotting with ggplot2 .它使处理空间数据非常方便(请参阅此处的综合文档),并使您能够简单地使用geom_sf()ggplot2一起ggplot2

library(tidyverse)
library(sf)
# read shape and convert state names to lower case 
states <- st_read("./Data/Admin2.shp") %>%
                 mutate(Name = str_to_lower(ST_NM))
# merge spatial data with population data, also convert state names to lower case in the latter
states_population <- states %>%
  left_join(states_data %>% mutate(Name = str_to_lower(Name)), "Name")
# grey states are the result of unmatched states outlined above
ggplot(states_population, aes(fill = TOT_P)) +
  geom_sf() +
  scale_fill_viridis_c() +
  ggthemes::theme_map()

印度各邦人口的等值线图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM