简体   繁体   中英

if else conditions in pandas dataframe and extract column value

I have this dataframe(df), that looks like

+-----------------+-----------+----------------+---------------------+--------------+-------------+
|      Gene       | Gene name |     Tissue     |      Cell type      |    Level     | Reliability |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| ENSG00000001561 | ENPP4     | adipose tissue | adipocytes          | Low          | Approved    |
| ENSG00000001561 | ENPP4     | adrenal gland  | glandular cells     | High         | Approved    |
| ENSG00000001561 | ENPP4     | appendix       | glandular cells     | Medium       | Approved    |
| ENSG00000001561 | ENPP4     | appendix       | lymphoid tissue     | Low          | Approved    |
| ENSG00000001561 | ENPP4     | bone marrow    | hematopoietic cells | Medium       | Approved    |
| ENSG00000002586 | CD99      | adipose tissue | adipocytes          | Low          | Supported   |
| ENSG00000002586 | CD99      | adrenal gland  | glandular cells     | Medium       | Supported   |
| ENSG00000002586 | CD99      | appendix       | glandular cells     | Not detected | Supported   |
| ENSG00000002586 | CD99      | appendix       | lymphoid tissue     | Not detected | Supported   |
| ENSG00000002586 | CD99      | bone marrow    | hematopoietic cells | High         | Supported   |
| ENSG00000002586 | CD99      | breast         | adipocytes          | Not detected | Supported   |
| ENSG00000003056 | M6PR      | adipose tissue | adipocytes          | High         | Approved    |
| ENSG00000003056 | M6PR      | adrenal gland  | glandular cells     | High         | Approved    |
| ENSG00000003056 | M6PR      | appendix       | glandular cells     | High         | Approved    |
| ENSG00000003056 | M6PR      | appendix       | lymphoid tissue     | High         | Approved    |
| ENSG00000003056 | M6PR      | bone marrow    | hematopoietic cells | High         | Approved    |
+-----------------+-----------+----------------+---------------------+--------------+-------------+

Expected output:


+-----------+--------+-------------------------------+
| Gene name | Level  |            Tissue             |
+-----------+--------+-------------------------------+
| ENPP4     | Low    | adipose tissue, appendix      |
| ENPP4     | High   | adrenal gland, bronchus       |
| ENPP4     | Medium | appendix, breast, bone marrow |
| CD99      | Low    | adipose tissue, appendix      |
| CD99      | High   | bone marrow                   |
| CD99      | Medium | adrenal gland                 |
| ...       | ...    | ...                           |
+-----------+--------+-------------------------------+

code used (took help from multiple if else conditions in pandas dataframe and derive multiple columns ):

def text_df(df):
    if (df[df['Level'].str.match('High')]):
        return (df.assign(Level='High') + df['Tissue'].astype(str))
    elif (df[df['Level'].str.match('Medium')]):
        return (df.assign(Level='Medium') + df['Tissue'].astype(str))
    elif (df[df['Level'].str.match('Low')]):
        return (df.assign(Level='Low') + df['Tissue'].astype(str))

df = df.apply(text_df, axis = 1)

Error: KeyError: ('Level', 'occurred at index 172') I can't understand what am I doing wrong. any suggestion?

Try:

df.groupby(['Gene name','Level'], as_index=False)['Cell type'].agg(', '.join)

Output:

|    | Gene name   | Level        | Cell type                                                                                                       |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
|  0 | CD99        | High         | hematopoietic cells                                                                                             |
|  1 | CD99        | Low          | adipocytes                                                                                                      |
|  2 | CD99        | Medium       | glandular cells                                                                                                 |
|  3 | CD99        | Not detected | glandular cells     ,  lymphoid tissue     ,  adipocytes                                                        |
|  4 | ENPP4       | High         | glandular cells                                                                                                 |
|  5 | ENPP4       | Low          | adipocytes          ,  lymphoid tissue                                                                          |
|  6 | ENPP4       | Medium       | glandular cells     ,  hematopoietic cells                                                                      |
|  7 | M6PR        | High         | adipocytes          ,  glandular cells     ,  glandular cells     ,  lymphoid tissue     ,  hematopoietic cells |

Update added per comments below:

(df.groupby(['Gene name','Level'], as_index=False)['Cell type']
   .agg(','.join).set_index(['Gene name','Level'])['Cell type']
   .unstack().reset_index())

Output:

| Gene name   |  High                                                                                                           |  Low                                   |  Medium                                    |  Not detected                                            |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99        | hematopoietic cells                                                                                             | adipocytes                             | glandular cells                            | glandular cells     ,  lymphoid tissue     ,  adipocytes |
| ENPP4       | glandular cells                                                                                                 | adipocytes          ,  lymphoid tissue | glandular cells     ,  hematopoietic cells | nan                                                      |
| M6PR        | adipocytes          ,  glandular cells     ,  glandular cells     ,  lymphoid tissue     ,  hematopoietic cells | nan                                    | nan                                        | nan                                                      |

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM