简体   繁体   English

在Pandas数据框中水平填充单元格值

[英]Filling cell values horizontally in Pandas dataframe

I know about bfill and ffill to fill values in rows of the same column. 我知道关于填充和填充以填充同一列的行中的值。 But how do you do it when you need to fill values across certain multiple columns in a dataframe? 但是,当您需要在数据框中的某些多个列中填充值时,该怎么做呢?

Here's the example: 例子如下:

Initial df: 初始df:

import pandas as pd
inidf = [('Prod', ['P1', 'P2']),
 ('A', ['1', '1']),
 ('1', ['', '40']),
 ('2', ['10', '60']),
 ('3', ['30', '']),
 ('B', ['1', '2']),             
 ]
df = pd.DataFrame.from_items(inidf)
df

  Prod  A   1   2   3  B
0   P1  1      10  30  1
1   P2  1  40  60      2

Target df: 目标df:

tgtdf = [('Prod', ['P1', 'P2']),
 ('A', ['1', '1']),
 ('1', ['10', '40']),
 ('2', ['10', '60']),
 ('3', ['30', '60']),
 ('B', ['1', '2']),             
 ]
df2 = pd.DataFrame.from_items(tgtdf)
df2

  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

In my example above, the columns to be targeted are Columns named 1, 2 and 3. In the first row, the first target column (named 1) has a missing value and is copied from the next populated Column in this case (named 2). 在上面的示例中,要定位的列是名为1、2和3的列。在第一行中,第一个目标列(名称为1)具有缺失值,并且在这种情况下是从下一个填充的列(名称为2)中复制的)。 In the second row, last target column (named 3) has a missing value and is copied from the previous populated Column in this case (named 2). 在第二行中,最后一个目标列(名称为3)缺少值,并且在这种情况下是从先前填充的列(名称为2)中复制的。

You can use replace first for convert empty spaces to NaN s. 您可以先使用replace将空白转换为NaN

Then select rows for bfill and for ffill replacing with axis=1 for replace by rows: 然后选择要bfill行和用axis=1替换的ffill以替换为行:

df = df.replace('', np.nan)
bfill_rows = [0] #if necessary specify more values of index
ffill_rows = [1] #if necessary specify more values of index

df.loc[bfill_rows] = df.loc[bfill_rows].bfill(axis=1)
df.loc[ffill_rows] = df.loc[ffill_rows].ffill(axis=1)
print (df)
  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

If necessary is possible specify columns also: 如有必要,还可以指定列:

df = df.replace('', np.nan)
cols = ['1','2','3']
bfill_rows = [0]
ffill_rows = [1]

df.loc[bfill_rows, cols] = df.loc[bfill_rows, cols].bfill(axis=1)
df.loc[ffill_rows, cols] = df.loc[ffill_rows, cols].ffill(axis=1)
print (df)

  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

Replace all blanks with NaN s and first ffill and then bfill on axis=1 for columns '1','2','3' NaN替换所有空白,并首先ffill ,然后在axis=1bfill'1','2','3'

In [31]: df[['1','2','3']] = df[['1','2','3']].replace('', np.nan).ffill(1).bfill(1)

In [32]: df
Out[32]:
  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

First of all, replace the empty quotes with NaN values. 首先,将空引号替换为NaN值。 Then ffill or bfill as needed, specifying axis=0 . 然后根据需要填充或填充,指定axis=0 The axis is 0 when selecting a given row because the result of such a selection is a series. 选择给定行时,该轴为0 ,因为这种选择的结果是一系列。 If you were to select multiple rows (eg the entire dataframe), then the axis would be 1 如果要选择多个行(例如整个数据框),则该轴将为1

df = df.replace('', np.nan)
df.iloc[0, :].bfill(axis=0, inplace=True)  # Backfill first row.
df.iloc[1, :].ffill(axis=0, inplace=True)  # Forwardfill second row.

>>> df
  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM