简体   繁体   English

如何替换 pandas 中每组完整行中的不完整行

[英]How to replace incomplete rows from complete rows per group in pandas

Trying to clean a dataset and I suspect I'm dealing with incomplete rows with the right information elsewhere in the data frame.尝试清理数据集时,我怀疑我正在处理数据框中其他地方具有正确信息的不完整行。 For example, something like例如,像

ride_id ride_id start_station_name起始站名称 start_lat起始纬度 start_lng开始_lng
12398213 12398213 Clark & Vermont克拉克和佛蒙特州 85.56 85.56 40.34 40.34
12398129 12398129 NaN钠盐 85.56 85.56 40.34 40.34

This would just be one of many such cases (for multiple stations).这只是许多此类情况之一(对于多个站点)。 Curious how you guys might go about searching through the data frame replacing the "NaN" with "Clark and Vermont" using start_lat and start_lng .好奇你们 go 如何使用start_latstart_lng搜索数据框将“NaN”替换为“Clark and Vermont”。

Group by the latitude and longitude, then forward-fill and backward-fill the station name. 按纬度和经度分组,然后向前填充向后填充站名。

Using a slightly bigger dataframe for demonstration:使用稍大的 dataframe 进行演示:

df = pd.DataFrame({'ride_id': range(100, 105), 'station_name': ['Clark & Vermont', np.nan, np.nan, 'Foo & Bar', np.nan], 'start_lat': [85.56]*2 + [15.59]*3, 'start_lng': [40.34]*2 + [20.30]*3})

#    ride_id     station_name  start_lat  start_lng
# 0      100  Clark & Vermont      85.56      40.34
# 1      101              NaN      85.56      40.34
# 2      102              NaN      15.59      20.30
# 3      103        Foo & Bar      15.59      20.30
# 4      104              NaN      15.59      20.30

Output: Output:

df['station_name'] = df.groupby(['start_lat', 'start_lng'])['station_name'].ffill().bfill()

#    ride_id     station_name  start_lat  start_lng
# 0      100  Clark & Vermont      85.56      40.34
# 1      101  Clark & Vermont      85.56      40.34
# 2      102        Foo & Bar      15.59      20.30
# 3      103        Foo & Bar      15.59      20.30
# 4      104        Foo & Bar      15.59      20.30

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM