简体   繁体   中英

R - ggplotly throws error after joining columns

I am currently working on a project that visualizes traffic stop data from the US. For this purpose I visualize the number of traffic stops in texas and california for certain years. I created a choropleth map with labels that worked out fine. I got the geo_data sf from the maps package in R. As there are so many labels, I want to create hover labels using the plotly package and the ggplotly() function. But if I try to plot my choropleth map with ggplotly I get an error that says: the number of columns of matrices must match . Here is the code I used:

p1 <- df %>%
  group_by(county_fips)%>%
  count()%>%
  full_join(geo_data, by = c("county_fips" = "fips")) %>%
  st_as_sf() %>%
  ggplot(aes(fill = n))+
  geom_sf()+
  geom_sf_text(aes(label = ID), fun.geometry = st_centroid)+
  scale_fill_continuous(low = "antiquewhite2", high = "palevioletred4", guide = "colorbar")+
  theme_void()
ggplotly(p1)

here is a sample for df:

id    state stop_date  county_name county_fips
<int> <fct> <date>     <fct>     <int>
 1      CA    2013-01-01 San Diego      6073
 2      CA    2013-01-01 San Diego      6073
 3      CA    2013-01-01 San Diego      6073
 4      CA    2013-01-01 San Diego      6073
 5      CA    2013-01-01 NA             NA
 6      CA    2013-01-01 Orange         6059
 7      CA    2013-01-01 Orange         6059
 8      CA    2013-01-01 Orange         6059
 9      CA    2013-01-01 Orange         6059

The geo_data sf has been created using this code and the package maps :

sf_map <- st_as_sf(map("county", plot = F, fill = T))
sf_map <- sf_map %>% filter(str_detect(ID, "california") | str_detect(ID, "texas"))
sf_map <- sf_map %>% filter(ID != "missouri,texas" & ID != "oklahoma,texas")
sf_map$ID <- gsub("texas,galveston", "texas,galveston:main", sf_map$ID)
data("county.fips")
geo_data <- left_join(sf_map, county.fips, by = c("ID" = "polyname"))

My assumption is, that this has something to do with the fact, that I have geo_data of all counties, but not stops in every county. That creates missing values in my county_fips by-argument in the join. I tried to exclude the cases in my data where the county_fips is missing before counting, but the error stays the same.

Here is an example of my data after the join:

county_fips      n ID                                                                    geometry
         <int>  <int> <chr>                                                       <MULTIPOLYGON [°]>
 1        6001 724809 california,alam~ (((-121.4785 37.4829, -121.5129 37.4829, -121.8853 37.4829, ~
 2        6003  37749 california,alpi~ (((-120.0748 38.70903, -120.0518 38.72049, -119.9544 38.7777~
 3        6005  32375 california,amad~ (((-120.0748 38.70903, -120.069 38.51995, -120.1263 38.5085,~
 4        6007  89359 california,butte (((-121.6217 39.31063, -121.9082 39.29345, -121.9082 39.3335~

I hope someone can tell me where to look in my code and data to find and solve the issue. Thank you very much in advance!

I do not claim that this my fix will work for you, but I hope it gives some ideas. For me, I had done a transformation using ms_simplify() , and I found that adding the argument explode=TRUE helped my case.

# Read datasets
facilities.lines.df.raw = read.csv(facilities.lines.path)
facilities.df.raw = read.csv(facilities.path)
facilities.shp = read_sf(facilities.shp.path)
districts.shp = read_sf(districts.shp.path)

# Cleaning
facilities.df = ...  # left this out
facilities.lines.df = ...  # left this out

# Scaling / projection system
districts.shp.trans <- st_transform(
  districts.shp, 4326)
# Reduce num of polys
districts.shp.trans.1 <- ms_simplify(
  districts.shp.trans, 
  keep=0.01,
  explode=TRUE)  # <----------- Adding "explode=TRUE" fixed my issue

# Linestrings
facility.linestrings = ...  # left this out
facility.multilinestring = st_multilinestring(
  do.call("rbind", facility.linestrings))
facility.multilinestring.st_sfc = st_sfc(
  facility.multilinestring, crs=PLANAR_XFORM_SCALAR_x2)

# Plot
gg = ggplot(districts.shp.trans.1) +
  geom_sf() +
  geom_sf(
    data=facility.multilinestring.st_sfc) + 
  geom_point(
    data=facilities.df,
    aes(x=longitude, y=latitude)))
ggplotly(gg)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM