Is it possible to split a column value and add a new column at the same time for dataframe?

Question

I have a dataframe with some columns delimited with '|', and I need to flatten this dataframe. Example:

name  type
a      l
b      m
c|d|e  n

For this df, I want to flatten it to:

   name type
    a    l
    b    m
    c    n
    d    n
    e    n

To do this, I used this command:

df = df.assign(name=df.name.str.split('|')).explode(column).drop_duplicates()

Now, I want do one more thing besides above flatten operation:

   name type  co_occur
    a    l
    b    m
    c    n    d
    c    n    e
    d    n    e

That is, not only split the 'c|d|e' into two rows, but also create a new column which contains a 'co_occur' relationship, in which 'c' and 'd' and 'e' co-occur with each other.

I don't see an easy way to do this by modifying:

df = df.assign(name=df.name.str.split('|')).explode(column).drop_duplicates()

Answer 1

I think this is what you want. Use combinations and piece everything together

from itertools import combinations
import io

data = '''name  type
a      l
b      m
c|d|e  n
j|k    o
f|g|h|i    p
'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

# hold the new dataframes as you iterate via apply()
df_hold = []
def explode_combos(x):
    combos = list(combinations(x['name'].split('|'),2))
    # print(combos)
    # print(x['type'])
    df_hold.append(pd.DataFrame([{'name':c[0], 'type':x['type'], 'co_cur': c[1]} for c in combos]))
    return

# only apply() to those rows that need to be exploded
dft = df[df['name'].str.contains('\|')].apply(explode_combos, axis=1)
# concatenate the result
dfn = pd.concat(df_hold)
# add back to rows that weren't operated on (see the ~)
df_final = pd.concat([df[~df['name'].str.contains('\|')], dfn]).fillna('')

  name type co_cur
0    a    l
1    b    m
0    c    n      d
1    c    n      e
2    d    n      e
0    j    o      k
0    f    p      g
1    f    p      h
2    f    p      i
3    g    p      h
4    g    p      i
5    h    p      i

Is it possible to split a column value and add a new column at the same time for dataframe?

Question

1 answers

solution1
0 2021-11-06 00:43:39

Is it possible to split a column value and add a new column at the same time for dataframe?

Question

1 answers

solution1 0 2021-11-06 00:43:39

solution1
0 2021-11-06 00:43:39