There is a pandas.DataFrame df
that looks like this:
City Country Latitude Longitude Population ...
Berlin Germany 52.516602 13.304105 118704
Berlin Germany 52.430884 13.192662 292000
...
Berlin USA 39.7742446 -75.0013423 7588
Berlin USA 43.9727912 -88.9858084 5524
I would like to group data by columns City
and Country
and sum up their population:
grouped_data = df.groupby([df['City'], df['Country'])['Population'].agg('sum').reset_index()
But in order to handle ambiguity – the two entries for USA are not to be merged –, my idea was to calculate and check the distance between lat/long for every potential groupby()
-result.
Assuming to have a distance function that returns the distance of two geographic points in kilometres, I'd like to group all entries by City and Country and sum up their population only if the result of distance()
is eg less than 50 kilometres.
The output for the example above could look like:
City Country Latitude Longitude Population
Berlin Germany [52.516602, 52.430884] [13.304105, 13.192662] 410704
...
Berlin USA 39.7742446 -75.0013423 7588
Berlin USA 43.9727912 -88.9858084 5524
Any idea how to solve this in pandas? I am happy for your suggestions.
What you are asking for is rather a network problem where two nodes are connected if their distance is < 50 km. In doing so, you can create a distance matrix and build up the graph with networkx
. Something along this line:
from sklearn.metrics.pairwise import haversine_distances as haversine
# calculate haversine
dist_mat = haversine(np.deg2rad(df[['Latitude','Longitude']]) ) * 6371 # earth's radius
adjacency = dist_mat < 50
import networkx as nx
G = nx.from_numpy_matrix(adjacency)
components = nx.connected_components(G)
And then you can groupby on that components
On the other hand, it might be easier for you to allow binning of the Lat/Long and groupby on those bins.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.