简体   繁体   中英

Choosing eps and minPts from DBSCAN with spatial data (lon, lat) in R?

I know that previous posts have addressed this topic, but I could not find any specifically for spatial points data. I have a dataset with all stop and frisk stops that took place in NYC in 2013. I am trying to identify "hot spots" of where stops occurred. The data is in this form:

stops <- data.frame(lon=c(-74.00478, -74.01046, -74.00521),
                    lat=c(40.71641, 40.71153, 40.72063),
                    precinct = c(1,1,1))

There are other features (total = 89), such as time of the stop, race of suspect, reason stopped, etc. Also there are 173,671 total observations.

My question is to do the knn method to find eps would I have to transform the lat and lon or can I use them as is?

Another question I have is how to choose minPts? I have watched other tutorials with crime data for python and R with Tableau integration and it seems as if they are choosing it based on some incident count. I used this code to get a minPts that relates to the average number of stops that occur in a day, but I am unsure if this is reliable.

stops2013clean %>%
group_by(precinct, lubridate::hour(stops2013clean$time)) %>%
summarise(n_stops=n()) %>%
summarise(mean(n_stops)) %>%
summarise(mean(`mean(n_stops)`))

Thanks for any help and guidance.

There is no algorithm to choose them. It is a matter of what you want to do.

With latitude and longitude, you should be using Haversine distance, to get meters, yards, feet, as you like (just make sure you know what unit you get).

Then you have to decide what a "hotspot" is. How many crimes in which radius? 10 crimes within 100 meters? Then you have your parameters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM