简体   繁体   English

在 Pandas DataFrame 中填充缺失的字段?

[英]Fill missing coumns in a Pandas DataFrame?

There are lots of questions about filling missing values.关于填充缺失值有很多问题。

I want to fill whole missing columns.我想填充整个缺失的列。

Suppose I have:假设我有:

df=pd.DataFrame([1,2], columns=['A'])

   A
0  1
1  2

What's the idiomatic way to do something like this?做这样的事情的惯用方式是什么?

df.fillmissing(['A','B','C'])

My current code:我当前的代码:

for name in colnames:
    if name not in df:
        df[name] = None

This produces:这会产生:

   A   B   C
0  1 NaN NaN
1  2 NaN NaN

Explanation of output: output说明:

In this case A is a no-op, but B and C get added, ie:在这种情况下A是空操作,但BC被添加,即:

  • I don't know ahead of time which columns are missing我不提前知道缺少哪些列
  • I know which columns I want我知道我想要哪些列
  • I want the most concise code (high performance is not a requirement)我想要最简洁的代码(高性能不是要求)

Any suggestions?有什么建议么?

Perhaps you need reindex :也许您需要reindex

df.reindex(['A', 'B', 'C'], axis=1)

   A   B   C
0  1 NaN NaN
1  2 NaN NaN

This fills missing columns with NaN, leaving existing columns as-is.这将用 NaN 填充缺失的列,使现有列保持原样。

You could also try transposing it then reindex it:您也可以尝试转置它然后reindex它:

print(df.T.reindex(['A', 'B', 'C']).T)

Let's say you have the following dataframe df and lookup columns cols :假设您有以下 dataframe df和查找列cols

df=pd.DataFrame([1,2], columns=['A'])
cols = ['A', 'B', 'C']

From there, you can subtract the list of the columns from the dataframe from the list of columns from cols and create the new columns all at once (that don't exist after subtraction) and set to None .从那里,您可以从cols的列列表中减去 dataframe 中的列列表,并一次创建新列(减去后不存在)并设置为None Note: You cannot directly subtract lists from each other unless you convert to a set first.注意:除非先转换为set ,否则不能直接将列表彼此相减。 Then, enclose [*] around the set to transform to a list:然后,将[*]括在集合周围以转换为列表:

Method 1: Set Subtraction方法一:设置减法

df[[*set(cols) - set(df.columns)]] = None
df
Out[1]: 
   A     B     C
0  1  None  None
1  2  None  None

The list comprehension way would be:列表理解方式是:

Method 2: List Comprehension (similar to your for loop)方法 2:列表理解(类似于你的 for 循环)

df[[col for col in cols if col not in df.columns]] = None
df
Out[1]: 
   A     B     C
0  1  None  None
1  2  None  None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM