如何在数据框中添加列？

Question

I have the following code: 我有以下代码：

db_fields = ("id", "email", "status", "source")
df = DataFrame(results)
for col in db_fields:
    if col not in df.columns:
          COLUMN IS MISSING - COMMAND TO ADD COLUMN

If for example status column is missing it needs to be added to the data frame with nothing as value so when I export the df to csv I will always have the same schema of fields. 例如，如果缺少status列，则需要将其添加到数据框中，而无需添加任何值。因此，当我将df导出到csv我将始终具有相同的字段架构。

I know that to remove column I should do: 我知道要删除列，我应该这样做：

df = df.drop(col, 1)

But I don't know what is the best way to add column with empty value. 但是我不知道添加具有空值的列的最佳方法是什么。

Answer 1

This method will added status column with Null values: 此方法将在状态列中添加Null值：

import numpy as np
df['status'] = np.nan

Alternatively: 或者：

df['status'] = None

So: 所以：

db_fields = ("id", "email", "status", "source")
for col in db_fields:
    if col not in df.columns:
        df[col] = None

Answer 2

You can create array of non exist columns and create new one with assign and dictionary: 您可以创建不存在的列的数组，并使用assign和dictionary创建新的列：

df = pd.DataFrame({'id': ['a1','a2', 'b1'],
                  'a': ['a1','a2', 'b1'],
                  'source': ['a1','a2', 'b1']})
print (df)
   id   a source
0  a1  a1     a1
1  a2  a2     a2
2  b1  b1     b1

db_fields = ("id", "email", "status", "source")

#get missing columns
diff = np.setdiff1d(np.array(db_fields), df.columns)
print (diff)
['email' 'status']

#get original columns not existed in db_fields
diff1 = np.setdiff1d(df.columns, np.array(db_fields)).tolist()
print (diff1)
['a']

#add missing columns with change order
d = dict.fromkeys(diff, np.nan)
df = df.assign(**d)[diff1 + list(db_fields)]
print (df)
    a  id  email  status source
0  a1  a1    NaN     NaN     a1
1  a2  a2    NaN     NaN     a2
2  b1  b1    NaN     NaN     b1

#if necessary first db_fields
df = df.assign(**d)[list(db_fields) + diff1]
print (df)
   id  email  status source   a
0  a1    NaN     NaN     a1  a1
1  a2    NaN     NaN     a2  a2
2  b1    NaN     NaN     b1  b1

Answer 3

Here you have it, plain and simple, in just one line : 在这里，只需一行就可以简单明了地看到它：

import numpy as np
db_fields = ("id", "email", "status", "source")
df = DataFrame(results)
for col in db_fields:
    if col not in df.columns:
        # Add the column
        df[col] = np.nan

BTW: You can also drop a column using df.drop(inplace=True) . 顺便说一句：您也可以使用df.drop(inplace=True)删除列。

如何在数据框中添加列？

问题描述

3 个解决方案

解决方案1
1 2018-11-26 12:54:13

解决方案2
1 已采纳 2018-11-26 12:57:03

解决方案3
1 2018-11-26 14:00:08

如何在数据框中添加列？

问题描述

3 个解决方案

解决方案1 1 2018-11-26 12:54:13

解决方案2 1 已采纳 2018-11-26 12:57:03

解决方案3 1 2018-11-26 14:00:08

解决方案1
1 2018-11-26 12:54:13

解决方案2
1 已采纳 2018-11-26 12:57:03

解决方案3
1 2018-11-26 14:00:08