用该组中的第一个非空值填充组中的所有值

Question

The following is the pandas dataframe I have:以下是我拥有的熊猫数据框：

cluster Value
1         A
1        NaN
1        NaN
1        NaN
1        NaN
2        NaN
2        NaN
2         B
2        NaN
3        NaN
3        NaN
3         C
3        NaN
4        NaN
4         S
4        NaN
5        NaN
5         A
5        NaN
5        NaN

If we look into the data, cluster 1 has Value 'A' for one row and remain all are NA values.如果我们查看数据，集群 1 的一行具有值“A”，并且仍然是 NA 值。 I want to fill 'A' value for all the rows of cluster 1. Similarly for all the clusters.我想为集群 1 的所有行填充“A”值。对于所有集群也是如此。 Based on one of the values of the cluster, I want to fill the remaining rows of the cluster.基于集群的值之一，我想填充集群的剩余行。 The output should be like,输出应该是这样的，

cluster Value
1         A
1         A
1         A
1         A
1         A
2         B
2         B
2         B
2         B
3         C
3         C
3         C
3         C
4         S
4         S
4         S
5         A
5         A
5         A
5         A

I am new to python and not sure how to proceed with this.我是 python 新手，不知道如何继续。 Can anybody help with this ?有人可以帮忙吗？

Answer 1

`groupby` + `bfill` , and `ffill` `groupby` + `bfill`和`ffill`

df = df.groupby('cluster').bfill().ffill()
df

    cluster Value
0         1     A
1         1     A
2         1     A
3         1     A
4         1     A
5         2     B
6         2     B
7         2     B
8         2     B
9         3     B
10        3     B
11        3     C
12        3     C
13        4     S
14        4     S
15        4     S
16        5     A
17        5     A
18        5     A
19        5     A

Or,或者，

`groupby` + `transform` with `first` `groupby` + `first` `transform`

df['Value'] = df.groupby('cluster').Value.transform('first')
df

    cluster Value
0         1     A
1         1     A
2         1     A
3         1     A
4         1     A
5         2     B
6         2     B
7         2     B
8         2     B
9         3     B
10        3     B
11        3     C
12        3     C
13        4     S
14        4     S
15        4     S
16        5     A
17        5     A
18        5     A
19        5     A

Answer 2

Edit编辑

The following seems better:以下似乎更好：

nan_map = df.dropna().set_index('cluster').to_dict()['Value']
df['Value'] = df['cluster'].map(nan_map)

print(df)

Original原来的

I can't think of a better way to do this than iterate over all the rows, but one might exist.我想不出比遍历所有行更好的方法，但可能存在一个。 First I built your DataFrame:首先，我构建了您的 DataFrame：

import pandas as pd
import math

# Build your DataFrame
df = pd.DataFrame.from_items([
    ('cluster', [1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,5,5,5,5]),
    ('Value', [float('nan') for _ in range(20)]),
])
df['Value'] = df['Value'].astype(object)
df.at[ 0,'Value'] = 'A'
df.at[ 7,'Value'] = 'B'
df.at[11,'Value'] = 'C'
df.at[14,'Value'] = 'S'
df.at[17,'Value'] = 'A'

Now here's an approach that first creates a nan_map dict, then sets the values in Value as specified in the dict.现在这里有一种方法，它首先创建一个nan_map字典，然后按照字典中的指定设置Value中的值。

# Create a dict to map clusters to unique values
nan_map = df.dropna().set_index('cluster').to_dict()['Value']
# nan_map: {1: 'A', 2: 'B', 3: 'C', 4: 'S', 5: 'A'}

# Apply
for i, row in df.iterrows():
    df.at[i,'Value'] = nan_map[row['cluster']]

print(df)

Output:输出：

cluster Value
0         1     A
1         1     A
2         1     A
3         1     A
4         1     A
5         2     B
6         2     B
7         2     B
8         2     B
9         3     C
10        3     C
11        3     C
12        3     C
13        4     S
14        4     S
15        4     S
16        5     A
17        5     A
18        5     A
19        5     A

Note: This sets all values based on the cluster and doesn't check for NaN-ness.注意：这会根据集群设置所有值，并且不检查 NaN-ness。 You may want to experiment with something like:您可能想尝试以下方法：

# Apply
for i, row in df.iterrows():
    if isinstance(df.at[i,'Value'], float) and math.isnan(df.at[i,'Value']):
        df.at[i,'Value'] = nan_map[row['cluster']]

to see which is more efficient (my guess is the former, without the checks).看看哪个更有效（我的猜测是前者，没有检查）。

用该组中的第一个非空值填充组中的所有值

问题描述

2 个解决方案

解决方案1
5 2018-06-20 23:42:47

`groupby` + `bfill` , and `ffill` `groupby` + `bfill`和`ffill`

`groupby` + `transform` with `first` `groupby` + `first` `transform`

解决方案2
2 已采纳 2018-06-20 23:06:30

用该组中的第一个非空值填充组中的所有值

问题描述

2 个解决方案

解决方案1 5 2018-06-20 23:42:47

groupby + bfill , and ffill groupby + bfill和ffill

groupby + transform with first groupby + first transform

解决方案2 2 已采纳 2018-06-20 23:06:30

解决方案1
5 2018-06-20 23:42:47

`groupby` + `bfill` , and `ffill` `groupby` + `bfill`和`ffill`

`groupby` + `transform` with `first` `groupby` + `first` `transform`

解决方案2
2 已采纳 2018-06-20 23:06:30