简体   繁体   中英

sorting multiple columns in pandas based on a single column

I have the following data frame:

df1:

Name    Tis    Exr    Name_2    Exr_2
A1FH    derm   3.4    GHJK      brn:2.4
N4RT    lng    0.1    PP2DS     Lvr:3.4;hup:2.3
GHJK    Pap    2.2    KLM3      tet:2.0
4HHR    stm    1.4    LSDR      NaN
PP2DS   skl    3.7    PMRT      van:3.7;epth:23.5
LSDR    lym    2.1    exty      NaN
2BC4    lym    4.4    NaN       NaN

Essentially columns "Tis" and "Exr" refer to column "Name", while column "Exr_2" refers to column "Name_2".

I am trying to sort the dataframe where if a row within column "Name" matches a row within column "Name_2" then they are moved onto the same row - and so is the data within the columns above. rows which don't match are kept but listed as NaN in the non-matching row. I'm looking to do this in alphabetical order.

Desired output:

df2:

Name   Tis   Exr   Name_2   Exr_2
GHJK   Pap   2.2   GHJK     brn:2.4
LSDR   lym   2.1   LSDR     NaN
PP2DS  skl   3.7   PP2DS    Lvr:3.4;hup:2.3
2BC4   lym   4.4   NaN      NaN
4HHR   stm   1.4   NaN      NaN
A1FH   derm  3.4   NaN      NaN
NaN    NaN   NaN   exty     NaN
NaN    NaN   NaN   KLM3     tet:2.0
N4RT   lng   0.1   NaN      NaN
NaN    NaN   NaN   PMRT     van:3.7;epth:23.5

I have tried a number of different things:

df1 = pd.read_csv('dataset.csv', error_bad_lines=False, sep = '\t')

df2 = df1.sort_values(['Name', 'Name_2'], ascending =[False, True])

tried:

df1[df1.Name==df1.Name_2]

I have also tried using various tools on Linux command line but using Pandas seems better since I am more familiar with Python.

The dataframe I have is over 41,000 lines.

You can split the data into two separate dataframes and use df.merge to match the names.

df2 = df1[['Name', 'Tis', 'Exr']].sort_values('Name')
df_temp = df1[['Name_2', 'Exr_2']]
df2 = df2.merge(df_temp, left_on='Name', right_on='Name_2', how='outer')
del df_temp

print(df2)

Output

     Name   Tis  Exr Name_2              Exr_2
0    2BC4   lym  4.4    NaN                NaN
1    4HHR   stm  1.4    NaN                NaN
2    A1FH  derm  3.4    NaN                NaN
3    GHJK   Pap  2.2   GHJK            brn:2.4
4    LSDR   lym  2.1   LSDR                NaN
5    N4RT   lng  0.1    NaN                NaN
6   PP2DS   skl  3.7  PP2DS    Lvr:3.4;hup:2.3
7     NaN   NaN  NaN   KLM3            tet:2.0
8     NaN   NaN  NaN   PMRT  van:3.7;epth:23.5
9     NaN   NaN  NaN   exty                NaN
10    NaN   NaN  NaN    NaN                NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM