[英]How to replace incomplete rows from complete rows per group in pandas
Trying to clean a dataset and I suspect I'm dealing with incomplete rows with the right information elsewhere in the data frame.尝试清理数据集时,我怀疑我正在处理数据框中其他地方具有正确信息的不完整行。 For example, something like
例如,像
ride_id ![]() |
start_station_name![]() |
start_lat![]() |
start_lng![]() |
---|---|---|---|
12398213 ![]() |
Clark & Vermont![]() |
85.56 ![]() |
40.34 ![]() |
12398129 ![]() |
NaN![]() |
85.56 ![]() |
40.34 ![]() |
This would just be one of many such cases (for multiple stations).这只是许多此类情况之一(对于多个站点)。 Curious how you guys might go about searching through the data frame replacing the "NaN" with "Clark and Vermont" using
start_lat
and start_lng
.好奇你们 go 如何使用
start_lat
和start_lng
搜索数据框将“NaN”替换为“Clark and Vermont”。
Group by the latitude and longitude, then forward-fill and backward-fill the station name. 按纬度和经度分组,然后向前填充和向后填充站名。
Using a slightly bigger dataframe for demonstration:使用稍大的 dataframe 进行演示:
df = pd.DataFrame({'ride_id': range(100, 105), 'station_name': ['Clark & Vermont', np.nan, np.nan, 'Foo & Bar', np.nan], 'start_lat': [85.56]*2 + [15.59]*3, 'start_lng': [40.34]*2 + [20.30]*3})
# ride_id station_name start_lat start_lng
# 0 100 Clark & Vermont 85.56 40.34
# 1 101 NaN 85.56 40.34
# 2 102 NaN 15.59 20.30
# 3 103 Foo & Bar 15.59 20.30
# 4 104 NaN 15.59 20.30
Output: Output:
df['station_name'] = df.groupby(['start_lat', 'start_lng'])['station_name'].ffill().bfill()
# ride_id station_name start_lat start_lng
# 0 100 Clark & Vermont 85.56 40.34
# 1 101 Clark & Vermont 85.56 40.34
# 2 102 Foo & Bar 15.59 20.30
# 3 103 Foo & Bar 15.59 20.30
# 4 104 Foo & Bar 15.59 20.30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.