简体   繁体   中英

Split values iterating over an unspecific number of columns in a pandas data frame

I have an application that generates data frames with different numbers of columns and their cells contains two values separated by "|".

gene_1             gene_2             ...
ashb|ESNT00011     wsefsf|ENST0008
adecasd|ENST0001   uibib|ENST0008

How can I iterate over columns and split values into two columns called gene_1_name and gene_1_ID

gene_1_name    gene_1_ID           gene_2_name     gene_2_ID         ...
ashb           ESNT00011           wsefsf          ENST0008
adecasd        ENST0001            uibib           ENST0008

Use stack and unstack :

result = (
    df.stack().str.split('|', expand=True)     # split the strings
        .rename(columns={0: 'name', 1: 'id'})  # rename the columns
        .unstack()                             # unstack
)

# Merge the two levels
result.columns = [f'{gene}_{col}' for col, gene in result.columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM