簡體   English   中英

內爆行並創建新列

[英]Implode rows and create new col

我怎樣才能創建一個獨特的細節列,它取決於類型列中 fruit 后面跟着 fruit -2 。 detail1 或 detail2 可以是 NaN

df type       detail1   detail2        name  
0  fruit                               apple
1  fruit -2   best      best           apple
2             yellow    yellowish      apple
3             green                    apple
4  fruit                               banana
5  sub
6  fruit -2   best      best           banana
7             yellow    orange         banana
8             green     brown          banana

預計 Output

df type       detail1   detail2        name     unique_detail
0  fruit                               apple    [best, yellow, yellowish, green ]
1  fruit -2   best      best           apple    [best, yellow, yellowish, green ]
2             yellow    yellowish      apple    [best, yellow, yellowish, green ]
3             green                    apple    [best, yellow, yellowish, green brown]
4  fruit                               banana   sub: [yellow, orange, green, brown]
5  sub
6  fruit -2                            banana   sub:[yellow, orange, green, brown]
7             yellow    orange         banana   sub:[yellow, orange, green, brown]
8             green     brown          banana   sub:[yellow, orange, green, brown]

我試過了

m = df.type.eq("fruit") & df.type.shift(-1).ne("fruit -2")
df["detail"] = df.detail1 + df.detail2
df["detail"] = df.groupby("type").transform("unique")
df["detail"] = df["detail"].mask(m, "sub:"+df.detail)

確切的邏輯並不完全清楚,但您應該為 groupby.apply 使用自定義groupby.apply

def process(df):
    m1 = df['type'].shift().eq('fruit')
    m2 = df['type'].ne('fruit -2')
    m3 = df['type'].isnull()
    
    prefix = next(iter(df.loc[m1&m2, 'type']), '')
    if prefix:
        prefix += ': '
    
    return prefix + str(df[m3].filter(regex='^detail').stack().unique())

group = df['name'].ffill()

s = df.groupby(group).apply(process)

df['unique_detail'] = group.map(s)

您也可以用作石斑魚:

group = (df['type'].eq('fruit')
         &df['type'].shift(-1).ne('fruit -2')
         ).cumsum()

Output:

       type detail1    detail2    name                             unique_detail
0     fruit     NaN        NaN   apple            ['yellow' 'yellowish' 'green']
1  fruit -2    best       best   apple            ['yellow' 'yellowish' 'green']
2       NaN  yellow  yellowish   apple            ['yellow' 'yellowish' 'green']
3       NaN   green        NaN   apple            ['yellow' 'yellowish' 'green']
4     fruit     NaN        NaN  banana  sub: ['yellow' 'orange' 'green' 'brown']
5       sub     NaN        NaN    None  sub: ['yellow' 'orange' 'green' 'brown']
6  fruit -2    best       best  banana  sub: ['yellow' 'orange' 'green' 'brown']
7       NaN  yellow     orange  banana  sub: ['yellow' 'orange' 'green' 'brown']
8       NaN   green      brown  banana  sub: ['yellow' 'orange' 'green' 'brown']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM