[英]how to merge data from a dictionary to a pandas dataframe with certain conditions
i'm struggling with this one.我正在努力解决这个问题。 I have this dictionary:
我有这本字典:
{'Rosario': [-60.63932, -32.946819], 'Concordia': [-74.448212, 40.31094], 'Avellaneda': [-58.367439, -34.660179], 'Corrientes': [-58.834099, -27.4806], 'Caballito': [-58.44104, -34.622639], 'Buenos Aires': [-78.497498, -9.12417], 'Paraná': [-60.5238, -31.73197], 'Santa Fé': [-78.14917, 8.65194], 'San Carlos de Bariloche': [-71.30822, -41.145569], 'Mendoza': [-68.827171, -32.890839]}
{'Rosario':[-60.63932,-32.946819],'Concordia':[-74.448212,40.31094],'Avellaneda':[-58.367439,-34.660179],'Corrientes':[-58.834099,-27.4806],'Caball ':[-58.44104,-34.622639],'布宜诺斯艾利斯':[-78.497498,-9.12417],'巴拉那':[-60.5238,-31.73197],'圣达菲':[-78.14917,8.65194],'圣卡洛斯德巴里洛切':[-71.30822,-41.145569],'门多萨':[-68.827171,-32.890839]}
which contains cities and its coordinates.其中包含城市及其坐标。 and I would like to merge the cities names as a column to a dataframe which also contains coordinates.
我想将城市名称作为一列合并到 dataframe 中,其中还包含坐标。 Is there a way to do it based on the (latitude and longitude) condition?
有没有办法根据(纬度和经度)条件来做到这一点?
this is a sample of a dataframe:这是 dataframe 的示例:
as you can see it has similar values on lat and lon.如您所见,它在纬度和经度上具有相似的值。 I also mention that dataframe has coordinates which are only in the dictionary.
我还提到 dataframe 的坐标仅在字典中。 I would really apreciate the help on this one.
我真的很感激这方面的帮助。
here's a sample of the dataframe, I was on the phone so i taked a screenshot instead: the dataframe has a lot of columns thats why i only take a few这是 dataframe 的示例,我正在打电话,所以我截取了一个屏幕截图:dataframe 有很多列,这就是为什么我只取几个
sunset temp feels_like pressure lat lon
0 1659463668 255.3 248.3 1012 -60.6393 -32.9468
1 1659377129 263.67 256.67 984 -60.6393 -32.9468
2 1659290591 258.31 253.58 983 -60.6393 -32.9468
3 1659204054 266.81 262.63 970 -60.6393 -32.9468
4 1659117518 255.42 255.42 979 -60.6393 -32.9468
1st step will be to make your dictionary into a dataframe:第一步是将您的字典变成 dataframe:
cities_dict = {'Rosario': [-60.63932, -32.946819], 'Concordia': [-74.448212, 40.31094], 'Avellaneda': [-58.367439, -34.660179], 'Corrientes': [-58.834099, -27.4806], 'Caballito': [-58.44104, -34.622639], 'Buenos Aires': [-78.497498, -9.12417], 'Paraná': [-60.5238, -31.73197], 'Santa Fé': [-78.14917, 8.65194], 'San Carlos de Bariloche': [-71.30822, -41.145569], 'Mendoza': [-68.827171, -32.890839]}
cities = pd.DataFrame.from_dict(cities_dict, 'index', columns=['lat', 'lon'])
print(cities)
# Output:
lat lon
Rosario -60.639320 -32.946819
Concordia -74.448212 40.310940
Avellaneda -58.367439 -34.660179
Corrientes -58.834099 -27.480600
Caballito -58.441040 -34.622639
Buenos Aires -78.497498 -9.124170
Paraná -60.523800 -31.731970
Santa Fé -78.149170 8.651940
San Carlos de Bariloche -71.308220 -41.145569
Mendoza -68.827171 -32.890839
From there, I think working with these geometrically will be easiest:从那里开始,我认为以几何方式处理这些将是最简单的:
pip install geopandas pygeos
import geopandas as gp
cities = gp.GeoSeries.from_xy(cities.lat, cities.lon)
cities = cities.reset_index().rename(columns={'index':'city'})
df['geometry'] = gp.GeoSeries.from_xy(df.lat, df.lon)
df = gp.GeoDataFrame(df)
out = gp.sjoin_nearest(df, cities)
print(out)
# Output:
sunset temp feels_like pressure lat lon geometry index_right city
0 1659463668 255.30 248.30 1012 -60.6393 -32.9468 POINT (-60.63930 -32.94680) 0 Rosario
1 1659377129 263.67 256.67 984 -60.6393 -32.9468 POINT (-60.63930 -32.94680) 0 Rosario
2 1659290591 258.31 253.58 983 -60.6393 -32.9468 POINT (-60.63930 -32.94680) 0 Rosario
3 1659204054 266.81 262.63 970 -60.6393 -32.9468 POINT (-60.63930 -32.94680) 0 Rosario
4 1659117518 255.42 255.42 979 -60.6393 -32.9468 POINT (-60.63930 -32.94680) 0 Rosario
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.