当值与另一列匹配时，Pandas 系列中的回填值

Question

I have a DataFrame like this:我有一个像这样的数据帧：

import numpy as np

raw_data = {'surface': [np.nan, np.nan, 'round', 'square'],
            'city': ['San Francisco', 'Miami', 'San Francisco', 'Miami']}

df = pd.DataFrame(raw_data, columns = ['surface', 'city'])

This looks like this:这看起来像这样：

        surface city
   0    NaN     San Francisco
   1    NaN     Miami
   2    round   San Francisco
   3    square  Miami

I need earliest instance of the San Francisco row to be filled with 'round', and the earlier Miami row to be filled with 'square'.我需要用“圆形”填充旧金山行的最早实例，用“方形”填充较早的迈阿密行。 Using .fillna(method='bfill') won't take into account other column values, and just fills all earlier rows with round.使用 .fillna(method='bfill') 不会考虑其他列值，只会用圆形填充所有较早的行。

The result would be:结果将是：

        surface city
   0    round   San Francisco
   1    square  Miami
   2    round   San Francisco
   3    square  Miami

Answer 1

You can use groupby.bfill ;您可以使用groupby.bfill ； group data frame by city column and then use bfill :按城市列对数据框进行分组，然后使用bfill ：

df.groupby('city').bfill()

#  surface           city
#0  round   San Francisco
#1  square          Miami
#2  round   San Francisco
#3  square          Miami

Answer 2

[Modified based on the admirable answer from PSidom ] [根据PSidom的令人钦佩的回答修改]

Using groupby() is the key point indeed, but it might be confusing not to mention what bfill() does as it's not doing what you actually think it does.使用groupby()确实是关键点，但更不用说bfill()作用可能会令人困惑，因为它并没有按照您实际认为的那样做。

Let's take a quick glance at the doc here .让我们快速浏览一下这里的文档。 Instead of back filling the data like what the OP wants, it actually just fill in the missing data with non-missing data in the next column.而不是像OP想要的那样回填数据，它实际上只是在下一列中用非缺失数据填充缺失数据。 It works great with groupby() in this case, while you also need to do groupby('*your group*').ffill() for forward filling in case that the data you have are more complicated.在这种情况下，它与groupby()配合得很好，而您还需要执行groupby('*your group*').ffill()进行前向填充，以防您拥有的数据更复杂。

For further illustration, let's modify your data like this:为了进一步说明，让我们像这样修改您的数据：

import numpy as np
import pandas as pd

raw_data = {'surface': [np.nan, np.nan, 'round', 'square', np.nan, np.nan, np.nan, np.nan],
            'city': ['San Francisco', 'Miami', 'San Francisco', 'Miami', 'Miami', 'Miami', 'San Francisco', 'Miami']}
df = pd.DataFrame(raw_data, columns = ['surface', 'city'])
df

#   surface city
#0  NaN     San Francisco
#1  NaN     Miami
#2  round   San Francisco
#3  square  Miami
#4  NaN     Miami
#5  NaN     Miami
#6  NaN     San Francisco
#7  NaN     Miami

With only df.groupby('city').bfill() , you'll got:只有df.groupby('city').bfill() ，你会得到：

df2 = df.groupby('city').bfill()
df2

#   surface city
#0  round   San Francisco
#1  square  Miami
#2  round   San Francisco
#3  square  Miami
#4  NaN     Miami
#5  NaN     Miami
#6  NaN     San Francisco
#7  NaN     Miami

See what is going on there?看看那里发生了什么？ bfill() did the job in row 0 and 1, but remain row 4 ~ 7 unchanged. bfill()在第 0 行和第 1 行完成了工作，但保持第 4 ~ 7 行不变。 You should use both bfill() and ffill() instead.您应该同时使用bfill()和ffill() 。 Maybe something like this:也许是这样的：

df3 =  df2.groupby('city').ffill()
df3

#   surface city
#0  round   San Francisco
#1  square  Miami
#2  round   San Francisco
#3  square  Miami
#4  square  Miami
#5  square  Miami
#6  round   San Francisco
#7  square  Miami

To be noticed, you shouldn't use something like df.groupby('city').bfill().ffill() .需要注意的是，您不应该使用df.groupby('city').bfill().ffill() 。 It'll fill in something wrong there.它会在那里填补一些错误。

当值与另一列匹配时，Pandas 系列中的回填值

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-05-08 03:07:03

解决方案2
1 2017-05-08 05:33:09

当值与另一列匹配时，Pandas 系列中的回填值

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-05-08 03:07:03

解决方案2 1 2017-05-08 05:33:09

解决方案1
1 已采纳 2017-05-08 03:07:03

解决方案2
1 2017-05-08 05:33:09