pandas：在DataFrame中组合两列

Question

I have a pandas DataFrame that has multiple columns in it: 我有一个pandas DataFrame ，里面有多个列：

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
foo                   11516  non-null values
bar                   228381  non-null values
Time_UTC              239897  non-null values
dtstamp               239897  non-null values
dtypes: float64(4), object(1)

where foo and bar are columns which contain the same data yet are named differently. 其中foo和bar是包含相同数据但名称不同的列。 Is there are a way to move the rows which make up foo into bar , ideally whilst maintaining the name of bar ? 是否有办法将构成foo的行移动到bar ，理想情况下保持bar的名称？

In the end the DataFrame should appear as: 最后，DataFrame应显示为：

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
bar                   239897  non-null values
Time_UTC              239897  non-null values
dtstamp               239897  non-null values
dtypes: float64(4), object(1)

That is the NaN values that made up bar were replaced by the values from foo . 这就是组成bar的NaN值被foo的值替换。

Answer 1

you can use directly fillna and assigning the result to the column 'bar' 你可以直接使用fillna并将结果分配给列'bar'

df['bar'].fillna(df['foo'], inplace=True)
del df['foo']

general example: 一般例子：

import pandas as pd
#creating the table with two missing values
df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2])
df2 = pd.DataFrame({'b':[5,6]}, index = [3,4])
dftot = pd.concat((df1, df2))
print dftot
#creating the dataframe to fill the missing values
filldf = pd.DataFrame({'a':[7,7,7,7]})

#filling 
print dftot.fillna(filldf)

Answer 2

Try this: 尝试这个：

pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)

If you want that data to become the new column bar , just assign the result to df['bar'] . 如果您希望该数据成为新的栏bar ，只需将结果分配给df['bar'] 。

Answer 3

Another option, use the .apply() method on the frame. 另一种选择是在框架上使用.apply()方法。 You can do reassign a column with deference to existing data... 您可以根据现有数据重新分配列...

import pandas as pd
import numpy as np

# get your data into a dataframe

# replace content in "bar" with "foo" if "bar" is null
df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1) 

# note: change 'np.NaN' with null values you have like an empty string

Answer 4

More modern pandas versions (since at least 0.12) have the combine_first() and update() methods for DataFrame and Series objects. 更现代的pandas版本（至少0.12）具有DataFrame和Series对象的combine_first()和update()方法。 For example if your DataFrame were called df , you would do: 例如，如果你的DataFrame被称为df ，你会这样做：

df.bar.combine_first(df.foo)

which would only alter Nan values of the bar column to match the foo column, and would do so inplace. 这只会改变bar列的Nan值以匹配foo列，并且会在原地进行。 To overwrite non-Nan values in bar with those in foo , you would use the update() method. 要使用foo非Nan值覆盖bar非Nan值，可以使用update()方法。

Answer 5

你也可以使用numpy来做到这一点。

df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])

pandas：在DataFrame中组合两列

问题描述

5 个解决方案

解决方案1
23 2014-05-21 15:38:41

解决方案2
22 已采纳 2012-06-10 21:38:40

解决方案3
5 2016-04-28 16:51:04

解决方案4
5 2016-11-30 00:57:03

解决方案5
2 2016-12-01 03:51:41

pandas：在DataFrame中组合两列

问题描述

5 个解决方案

解决方案1 23 2014-05-21 15:38:41

解决方案2 22 已采纳 2012-06-10 21:38:40

解决方案3 5 2016-04-28 16:51:04

解决方案4 5 2016-11-30 00:57:03

解决方案5 2 2016-12-01 03:51:41

解决方案1
23 2014-05-21 15:38:41

解决方案2
22 已采纳 2012-06-10 21:38:40

解决方案3
5 2016-04-28 16:51:04

解决方案4
5 2016-11-30 00:57:03

解决方案5
2 2016-12-01 03:51:41