I'm trying to sort the columns of a .csv file. These are the names and the order of the columns:
'Unnamed: 0', 'Unnamed: 1',
'25Mg BLK', '25Mg 1', '25Mg 2',
'44Ca BLK', '44Ca 1', '44Ca 2',
'137Ba BLK', '137Ba 1', '137Ba 2',
'25Mg 3', '25Mg 4', '25Mg 5',
'44Ca 3', '44Ca 4', 44Ca 5',
'137Ba 3', '137Ba 4', '137Ba 5',
This is the order I would like to have:
'Unnamed: 0', 'Unnamed: 1',
'25Mg BLK', '25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5',
'44Ca BLK', '44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', 44Ca 5',
'137Ba BLK', '137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5',
Currently my code looks like this:
import pandas as pd
df = pd.read_csv("real_data.csv", header=2)
df2 = df.reindex_axis(sorted(df.columns), axis=1)
print(df2)
df2.to_csv("sorted.csv")
With my current code I get the following result for the order of the columns:
'137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5', '137Ba BLK',
'25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5', '25Mg BLK',
'44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', '44Ca 5', '44Ca BLK'
So I already figured out that I have to pass a function to the sorted function to specify how I want it to sort it, but I can't figure out a function which would do that.
Any input is highly appreciated!
Use helper DataFrame
, sort columns and then reindex
by a.index
:
c = df.columns
a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True)
#convert ints
a[0] = a[0].astype(int)
#convert to floats, non exis numbers generate NaNs
a[2] = pd.to_numeric(a[2], errors='coerce')
a = a.sort_values([0,1,2], na_position='first')
print (a)
0 1 2
25Mg BLK 25 Mg NaN
25Mg 1 25 Mg 1.0
25Mg 2 25 Mg 2.0
25Mg 3 25 Mg 3.0
25Mg 4 25 Mg 4.0
25Mg 5 25 Mg 5.0
44Ca BLK 44 Ca NaN
44Ca 1 44 Ca 1.0
44Ca 2 44 Ca 2.0
44Ca 3 44 Ca 3.0
44Ca 4 44 Ca 4.0
44Ca 5 44 Ca 5.0
137Ba BLK 137 Ba NaN
137Ba 1 137 Ba 1.0
137Ba 2 137 Ba 2.0
137Ba 3 137 Ba 3.0
137Ba 4 137 Ba 4.0
137Ba 5 137 Ba 5.0
df = df.reindex_axis(c[:2].tolist() + a.index.tolist(), axis=1)
print (df)
See this answer here: https://stackoverflow.com/a/33555435/8239103 It seems to do what you want. For clarity I'll post the code here.
sequence = [Your sequence as a list as above]
your_dataframe = your_dataframe.reindex(columns=sequence)
from natsort import natsorted, ns
l1=list(map(lambda x: x.replace('BLK', '0000000'), l1))
l1=natsorted(l1)
l1=list(map(lambda x: x.replace('0000000', 'BLK'), l1))
l1
Out[1125]:
['25Mg BLK',
'25Mg 1',
'25Mg 2',
'25Mg 3',
'25Mg 4',
'25Mg 5',
'44Ca BLK',
'44Ca 1',
'44Ca 2',
'44Ca 3',
'44Ca 4',
'44Ca 5',
'137Ba BLK',
'137Ba 1',
'137Ba 2',
'137Ba 3',
'137Ba 4',
'137Ba 5']
Then doing df.reindex(l1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.