是否可以拆分列值並同時為數據框添加新列？

Question

我有一個數據框，其中一些列以“|”分隔，我需要展平這個數據框。 例子：

name  type
a      l
b      m
c|d|e  n

對於這個 df，我想將其展平為：

   name type
    a    l
    b    m
    c    n
    d    n
    e    n

為此，我使用了以下命令：

df = df.assign(name=df.name.str.split('|')).explode(column).drop_duplicates()

現在，除了扁平化操作之外，我還想做一件事：

   name type  co_occur
    a    l
    b    m
    c    n    d
    c    n    e
    d    n    e

也就是說，不僅將 'c|d|e' 拆分為兩行，還創建了一個包含 'co_occur' 關系的新列，其中 'c' 和 'd' 以及 'e' 與每個同時出現其他。

我沒有看到通過修改來做到這一點的簡單方法：

df = df.assign(name=df.name.str.split('|')).explode(column).drop_duplicates()

Answer 1

我想這就是你想要的。 使用組合並將所有東西拼湊在一起

from itertools import combinations
import io

data = '''name  type
a      l
b      m
c|d|e  n
j|k    o
f|g|h|i    p
'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

# hold the new dataframes as you iterate via apply()
df_hold = []
def explode_combos(x):
    combos = list(combinations(x['name'].split('|'),2))
    # print(combos)
    # print(x['type'])
    df_hold.append(pd.DataFrame([{'name':c[0], 'type':x['type'], 'co_cur': c[1]} for c in combos]))
    return

# only apply() to those rows that need to be exploded
dft = df[df['name'].str.contains('\|')].apply(explode_combos, axis=1)
# concatenate the result
dfn = pd.concat(df_hold)
# add back to rows that weren't operated on (see the ~)
df_final = pd.concat([df[~df['name'].str.contains('\|')], dfn]).fillna('')

  name type co_cur
0    a    l
1    b    m
0    c    n      d
1    c    n      e
2    d    n      e
0    j    o      k
0    f    p      g
1    f    p      h
2    f    p      i
3    g    p      h
4    g    p      i
5    h    p      i

是否可以拆分列值並同時為數據框添加新列？

問題描述

1 個解決方案

解決方案1
0 2021-11-06 00:43:39

是否可以拆分列值並同時為數據框添加新列？

問題描述

1 個解決方案

解決方案1 0 2021-11-06 00:43:39

解決方案1
0 2021-11-06 00:43:39