简体   繁体   English

当值与另一列匹配时,Pandas 系列中的回填值

[英]Backfill values in Pandas series when value matches another column

I have a DataFrame like this:我有一个像这样的数据帧:

import numpy as np

raw_data = {'surface': [np.nan, np.nan, 'round', 'square'],
            'city': ['San Francisco', 'Miami', 'San Francisco', 'Miami']}

df = pd.DataFrame(raw_data, columns = ['surface', 'city'])

This looks like this:这看起来像这样:

        surface city
   0    NaN     San Francisco
   1    NaN     Miami
   2    round   San Francisco
   3    square  Miami

I need earliest instance of the San Francisco row to be filled with 'round', and the earlier Miami row to be filled with 'square'.我需要用“圆形”填充旧金山行的最早实例,用“方形”填充较早的迈阿密行。 Using .fillna(method='bfill') won't take into account other column values, and just fills all earlier rows with round.使用 .fillna(method='bfill') 不会考虑其他列值,只会用圆形填充所有较早的行。

The result would be:结果将是:

        surface city
   0    round   San Francisco
   1    square  Miami
   2    round   San Francisco
   3    square  Miami

You can use groupby.bfill ;您可以使用groupby.bfill group data frame by city column and then use bfill :城市列对数据框进行分组,然后使用bfill

df.groupby('city').bfill()

#  surface           city
#0  round   San Francisco
#1  square          Miami
#2  round   San Francisco
#3  square          Miami

[Modified based on the admirable answer from PSidom ] [根据PSidom的令人钦佩的回答修改]

Using groupby() is the key point indeed, but it might be confusing not to mention what bfill() does as it's not doing what you actually think it does.使用groupby()确实是关键点,但更不用说bfill()作用可能会令人困惑,因为它并没有按照您实际认为的那样做。

Let's take a quick glance at the doc here .让我们快速浏览一下这里的文档。 Instead of back filling the data like what the OP wants, it actually just fill in the missing data with non-missing data in the next column.而不是像OP想要的那样回填数据,它实际上只是在下一列中用非缺失数据填充缺失数据。 It works great with groupby() in this case, while you also need to do groupby('*your group*').ffill() for forward filling in case that the data you have are more complicated.在这种情况下,它与groupby()配合得很好,而您还需要执行groupby('*your group*').ffill()进行前向填充,以防您拥有的数据更复杂。

For further illustration, let's modify your data like this:为了进一步说明,让我们像这样修改您的数据:

import numpy as np
import pandas as pd

raw_data = {'surface': [np.nan, np.nan, 'round', 'square', np.nan, np.nan, np.nan, np.nan],
            'city': ['San Francisco', 'Miami', 'San Francisco', 'Miami', 'Miami', 'Miami', 'San Francisco', 'Miami']}
df = pd.DataFrame(raw_data, columns = ['surface', 'city'])
df

#   surface city
#0  NaN     San Francisco
#1  NaN     Miami
#2  round   San Francisco
#3  square  Miami
#4  NaN     Miami
#5  NaN     Miami
#6  NaN     San Francisco
#7  NaN     Miami

With only df.groupby('city').bfill() , you'll got:只有df.groupby('city').bfill() ,你会得到:

df2 = df.groupby('city').bfill()
df2

#   surface city
#0  round   San Francisco
#1  square  Miami
#2  round   San Francisco
#3  square  Miami
#4  NaN     Miami
#5  NaN     Miami
#6  NaN     San Francisco
#7  NaN     Miami

See what is going on there?看看那里发生了什么? bfill() did the job in row 0 and 1, but remain row 4 ~ 7 unchanged. bfill()在第 0 行和第 1 行完成了工作,但保持第 4 ~ 7 行不变。 You should use both bfill() and ffill() instead.您应该同时使用bfill()ffill() Maybe something like this:也许是这样的:

df3 =  df2.groupby('city').ffill()
df3

#   surface city
#0  round   San Francisco
#1  square  Miami
#2  round   San Francisco
#3  square  Miami
#4  square  Miami
#5  square  Miami
#6  round   San Francisco
#7  square  Miami

To be noticed, you shouldn't use something like df.groupby('city').bfill().ffill() .需要注意的是,您不应该使用df.groupby('city').bfill().ffill() It'll fill in something wrong there.它会在那里填补一些错误。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 回填列值使用实际值除以 Pandas 中前面的 NA 值的数量 - Backfill column values using real value divided by number of preceding NA values in Pandas 熊猫回填特定值 - Pandas backfill specific value 如果列名与另一个 DataFrame pandas 的行值匹配,则获取 DataFrame 的列值 - Get column values of a DataFrame if column name matches row value of another DataFrame pandas pandas 当另一列与 python 中的值匹配时,仅获取一列的平均值(浮点) - pandas Getting just the mean (float) value of one column when another column matches value in python 如果数据框中的另一列使用pandas匹配某个值,则从数据框中的列中减去值 - substract values from column in dataframe if another column in dataframe matches some value using pandas 熊猫:当列值与另一个DF的列值匹配时,提取DF的行 - Pandas: Extract rows of a DF when a column value matches with a column value of another DF Pandas - 检查另一个数据框列中的系列值 - Pandas - check for series value in another dataframe column 查找 pandas 系列中的值何时超过另一个系列中的多个阈值 - Finding when a value in a pandas Series crosses multiple threshold values from another Series Pandas:当 1 列匹配键且另一列包含值时有条件地删除行 - Pandas: Drop rows conditionally when 1 column matches key and another column contains value 使用 pandas 按另一列中的值计算一列中的正则表达式匹配 - Count regex matches in one column by values in another column with pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM