[英]Complex case of filling NaNs in Pandas
bloomberg morningstar yahoo
0 AAPL1 AAPL2 NaN
1 AAPL1 NaN AAPL3
2 NaN GOOG4 GOOG5
3 GOOG6 GOOG4 NaN
4 IBM7 NaN IBM8
5 NaN IBM9 IBM8
6 NaN NaN FB
bloomberg morningstar yahoo
0 AAPL1 AAPL2 AAPL3
1 GOOG6 GOOG4 GOOG5
2 IBM7 IBM9 IBM8
3 NaN NaN FB
I've munged my data enough to ensure that there will never be any "conflicting" information in a given column of the starting dataframe, eg the following is not possible...我已经被改写的我的数据足以保证绝不会有在开始数据帧的给定列的任何“冲突的”信息,例如以下是不可能的...
A column Another column
0 AAPL1 One thing
1 AAPL1 Another thing
The only thing that can happen is that any given column either has 1) no information or 2) the right information, eg唯一可能发生的事情是任何给定的列要么有 1) 没有信息或 2) 正确的信息,例如
A column Another column
0 AAPL1 NaN
1 AAPL1 The right information
All I want to do is fill the NaN's with the "right" information where available and then drop duplicates (which should be easy).我想要做的就是用可用的“正确”信息填充 NaN,然后删除重复项(这应该很容易)。
But some NaNs should remain, as I don't have enough data to infer their value, eg the FB row in the example.但是一些 NaN 应该保留,因为我没有足够的数据来推断它们的值,例如示例中的 FB 行。
Here is some code to load the starting dataframe if you'd like to play around:如果您想玩,这里有一些代码可以加载起始数据帧:
import pandas as pd
data = [
{'bloomberg': 'AAPL1', 'morningstar': 'AAPL2'},
{'bloomberg': 'AAPL1', 'yahoo': 'AAPL3'},
{'morningstar': 'GOOG4', 'yahoo': 'GOOG5'},
{'bloomberg': 'GOOG6', 'morningstar': 'GOOG4'},
{'bloomberg': 'IBM7', 'yahoo': 'IBM8'},
{'morningstar': 'IBM9', 'yahoo': 'IBM8'},
{'yahoo': 'FB'}]
df = pd.DataFrame(data)
Chaining ffill
and bfill
will do what you want:链接
ffill
和bfill
会做你想要的:
df.fillna(method='ffill', axis=1).fillna(method='bfill', axis=1).drop_duplicates()
bloomberg morningstar yahoo
0 AAPL AAPL AAPL
2 GOOG GOOG GOOG
4 IBM IBM IBM
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.