简体   繁体   中英

How to sort column names with different characters(., !, @, $, (, &) in a dataframe with python

The dataframe i want to sort has names including characters, punctuation, numbers, dots, parenthesis etc. with more than 5000 columns. All these columns are duplicated 4 times. The values are same for duplicated columns. A subset of header names look like:

    ['I','single', 'game', 'I.1', 'Cliff', ',', 'on', 'me', 'RT', '@USER', ':', 'Texas', '(', 
     'cont', ')', 'URL', 'RT.1', '@USER.1', ':.1', '4', 'the', 'lingerie', 'party', '?????', 
     'Wednesday', 'ã\x80\x8bhave', 'a.1', 'nice', 'day', ':)', 'RT.2', '@USER.2']

First, i need to remove the integer suffixes from all the names like 'I.1' should be 'I' and similarly, all the other suffixes from all column names.

Secondly, all the columns are repeated four times in same order. I need to sort them according to this order:

      ['I', 'I','I','I','single','single''single''single','game', 'game','game','game','I',  
       'I','I','I','.', 'Cliff', 'Cliff','Cliff','Cliff',','','','',', 'on','on','on','on',  
       .... and so on]

Here the 'I's' with 'single' and 'game' should come together and not the other 'I's'. The functions like sort_index() and reset_index() give a sorting order but not the one i require.

Any help.

I tried different methods but due to strange nature of characters as names and a long list of columns with specific format requirements, i could not find a proper solution.

The solution i found and it worked for me is that first i transpose the dataframe. Then i create a separate index column with numbers and use this index to sort the dataframe in the format i reacquired. Although it might not be the perfect solution but by doing it, i can easily perform further processing on it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM