简体   繁体   中英

reshape pandas dataframe column with delimiter

I have the following dataframe (tab file with 2 columns-str) :

id1  id2

g1   ID:05434
g1   ID:05434
g1   NaN
g1   ID:05434|ID:38720|ID:33345

After doing

df1 = df[df['id2'].notnull()]
df2 = df1.drop_duplicates(['id1','id2'])

I got df2,

id1  id2

g1   ID:05434
g1   ID:05434|ID:38720|ID:33345

I am aiming to expand this to make it only 2 columns, say

id1  id2

g1   ID:05434
g1   ID:05434
g1   ID:38720
g1   ID:33345

Is there any expand function for this ?

Thanks in advance.

Use str.split with stack , also for remove NaN s is used DataFrame.dropna .

EDIT: By OP comment was removed duplicated in the end with sorting values:

df2 = (df.dropna(subset=['id2'])
         .set_index('id1')['id2']
         .str.split('|', expand=True)
         .stack()
         .reset_index(level=1, drop=True)
         .reset_index(name='id2')
         .sort_values(by=['col1', 'col2'])
         .drop_duplicates(['col1','col2']))

print (df2)
  id1       id2
0  g1  ID:05434
2  g1  ID:38720
3  g1  ID:33345

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM