简体   繁体   English

在 Pandas 中填充 NaN 的复杂案例

[英]Complex case of filling NaNs in Pandas

Is there a way to go from this...有没有办法从这个...

   bloomberg  morningstar  yahoo
0  AAPL1      AAPL2        NaN
1  AAPL1      NaN          AAPL3
2  NaN        GOOG4        GOOG5
3  GOOG6      GOOG4        NaN
4  IBM7       NaN          IBM8
5  NaN        IBM9         IBM8
6  NaN        NaN          FB

... to this ... ……对这个……

   bloomberg  morningstar  yahoo
0  AAPL1      AAPL2        AAPL3
1  GOOG6      GOOG4        GOOG5
2  IBM7       IBM9         IBM8
3  NaN        NaN          FB

... in Pandas? ...在熊猫?

I've munged my data enough to ensure that there will never be any "conflicting" information in a given column of the starting dataframe, eg the following is not possible...我已经被改写的我的数据足以保证绝不会有在开始数据帧的给定列的任何“冲突的”信息,例如以下是不可能的...

   A column  Another column
0  AAPL1     One thing
1  AAPL1     Another thing

The only thing that can happen is that any given column either has 1) no information or 2) the right information, eg唯一可能发生的事情是任何给定的列要么有 1) 没有信息或 2) 正确的信息,例如

   A column  Another column
0  AAPL1     NaN
1  AAPL1     The right information

All I want to do is fill the NaN's with the "right" information where available and then drop duplicates (which should be easy).我想要做的就是用可用的“正确”信息填充 NaN,然后​​删除重复项(这应该很容易)。

But some NaNs should remain, as I don't have enough data to infer their value, eg the FB row in the example.但是一些 NaN 应该保留,因为我没有足够的数据来推断它们的值,例如示例中的 FB 行。

Anybody have a good answer?有人有好的答案吗? Thanks for the help!谢谢您的帮助!

Here is some code to load the starting dataframe if you'd like to play around:如果您想玩,这里有一些代码可以加载起始数据帧:

import pandas as pd
data = [
        {'bloomberg': 'AAPL1', 'morningstar': 'AAPL2'},
        {'bloomberg': 'AAPL1', 'yahoo': 'AAPL3'},
        {'morningstar': 'GOOG4', 'yahoo': 'GOOG5'},
        {'bloomberg': 'GOOG6', 'morningstar': 'GOOG4'},
        {'bloomberg': 'IBM7', 'yahoo': 'IBM8'},
        {'morningstar': 'IBM9', 'yahoo': 'IBM8'},
        {'yahoo': 'FB'}]
df = pd.DataFrame(data)

Chaining ffill and bfill will do what you want:链接ffillbfill会做你想要的:

df.fillna(method='ffill', axis=1).fillna(method='bfill', axis=1).drop_duplicates()

  bloomberg morningstar yahoo
0      AAPL        AAPL  AAPL
2      GOOG        GOOG  GOOG
4       IBM         IBM   IBM

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM