I have tried a lot to sort DataFrame column on my own way. But could not be able to correctly do it. So refer given code and let me know what is the additional syntax to do the job.
df = pd.DataFrame({'TC': {0: '1-1.1', 1: '1-1.2', 2: '1-10.1', 3: '1-10.2', 4: '1-2.1', 5: '1-2.1', 6: '1-2.2', 7: '1-20.1', 8: '1-20.2', 9: '1-3.1'}, 'Case': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H', 8: 'I', 9: 'J'}})
df.sort_values(["TC"], ascending=[True])
print (df)
This code does not give desire output. I need the Dataframe sorted as per below.
You can extract the numbers and form a tuple
, then sort that series
and use its index
to reindex
your original DataFrame.
>>> df.reindex(
df['TC'].str.extractall('(\d+)')
.unstack().astype(int)
.agg(tuple, 1).sort_values()
.index
)
TC Case
0 1-1.1 A
1 1-1.2 B
4 1-2.1 E
5 1-2.1 F
6 1-2.2 G
9 1-3.1 J
2 1-10.1 C
3 1-10.2 D
7 1-20.1 H
8 1-20.2 I
You can also use the key
argument in sort_values
:
>>> df.sort_values('TC',
key=lambda ser:
ser.str.extractall('(\d+)')
.unstack()
.astype(int).agg(tuple, 1)
)
If there are always three parts to an ID
you can use Series.str.split
on non-numeric
characters with expand=True
, instead of extractall
, hence removing the need to use unstack
:
>>> df.sort_values('TC',
key=lambda series:
series.str.split(r'\D+', expand=True)
.astype(int).agg(tuple,1)
)
Timings:
>>> %timeit df.reindex(df['TC'].str.extractall('(\d+)').unstack().astype(int).agg(tuple, 1).sort_values().index)
2.95 ms ± 40.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df.sort_values('TC', key=lambda ser: ser.str.extractall('(\d+)').unstack().astype(int).agg(tuple, 1))
2.91 ms ± 32.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df.sort_values('TC', key=lambda series:series.str.split(r'\D+', expand=True).astype(int).agg(tuple,1))
1.6 ms ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I would have done it this way. I think this would be faster.
df["range"] = df["TC"].apply(lambda x: [float(y) for y in x.split("-")])
df = df.sort_values(["range"], ascending=True).drop(["range"], axis="columns")
EDITED: And since you asked for the case where the format of the range as 1_1_2 in place of 1-1.2 I would have done it this way:
df["range"] = df["TC"].apply(lambda x: tuple(x.split("_")))
df["range"] = df["range"].apply(lambda x: [float(x[0]), float("{}.{}".format(x[1], x[2]))])
df = df.sort_values(["range"], ascending=True).drop(["range"], axis="columns")
I have made one sort() function which will solve your query.
import pandas as pd df = pd.DataFrame({'TC': {0: '1-1.1', 1: '1-1.2', 2: '1-10.1', 3: '1-10.2', 4: '1-2.1', 5: '1-2.1', 6: '1-2.2', 7: '1-20.1', 8: '1-20.2', 9: '1-3.1'}, 'Case': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H', 8: 'I', 9: 'J'}}) def sort(df): listTC=[] for i in df['TC']: listTC.append(float(i[2:])) df1=pd.DataFrame(list(zip(listTC,list(df['Case']))),columns=['TC','Case']) df_f=df1.sort_values(by=['TC']) listTC_final=[] for i in df_f['TC']: listTC_final.append('1-'+str(i)) df_Final=pd.DataFrame(list(zip(listTC_final,list(df_f['Case']))),columns=['TC','Case']) return df_Final print(sort(df))
Still if any questions let me know. Thanks
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.