[英]Fill na values in one pandas dataframe's column using another, but using column indices, not names
I have a peculiar situation in which I know that the 0th column may contain nan
, and in all these cases, the 1st column contains the missing value.我有一个特殊的情况,我知道第 0 列可能包含
nan
,并且在所有这些情况下,第 1 列包含缺失值。 I do not know the names of these columns ahead of time, so I want to select them by index instead.我事先不知道这些列的名称,所以我想通过索引来 select 它们。
I am able to select the column like so:我能够像这样 select 列:
df.iloc[:, [0]]
And normally I am able to fill na
values from another column like this, if I knew the names:如果我知道名称,通常我可以像这样从另一列中填充
na
值:
df["col0"].fillna(df["col1"])
So I thought that I should be able to replicate this by doing this:所以我认为我应该能够通过这样做来复制它:
df.iloc[:, [0]].fillna(df[:, [1]])
But I get:但我得到:
Traceback (most recent call last):
File "/home/---------/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "/home/---------/_code/microgrid-support/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/---------/_code/microgrid-support/venv/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 82, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(None, None, None), [1])' is an invalid key
How can I fill the na
values using the column index instead of its name?如何使用列索引而不是其名称来填充
na
值?
You could use bfill
with iloc
您可以将
bfill
与iloc
一起使用
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[np.nan,2,3],'b':[100,200,300],'c':['x','y','z']})
df.iloc[:,:2] = df.iloc[:,:2].bfill(axis=1)
print(df)
Output Output
a b c
0 100.0 100.0 x
1 2.0 200.0 y
2 3.0 300.0 z
Use the iloc accessor.使用 iloc 访问器。 Slice using index range to avaoid running into issues.
使用索引范围切片以避免遇到问题。
Sample样本
df1 = pd.DataFrame({'sub_name': [np.nan,'AAB','AAC','BAA','CAA','CAC','CAD','CAE','EAA', 'FAA'],
'val_1': [2,4,8,7,4,6,2,3,8,3],
'A':[208,208,208,210,213,213,213,213,222,223]})
df1.iloc[0:1,0].fillna(df1.iloc[0,1])
sub_name val_1 A
0 2 2 208
1 AAB 4 208
2 AAC 8 208
3 BAA 7 210
4 CAA 4 213
5 CAC 6 213
6 CAD 2 213
7 CAE 3 213
8 EAA 8 222
9 FAA 3 223
I found a way of doing it myself differently than the other answers.我找到了一种与其他答案不同的方法。 I just accessed the name of the column via the
.columns
attribute of the dataframe and used that:我刚刚通过 dataframe 的
.columns
属性访问了列的名称并使用了它:
For example:例如:
df[df.columns[0]].fillna(df[df.columns[1]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.