简体   繁体   中英

Create column names in specific order pandas python

I have tried the following code which is working but does not generate the expected output:

import pandas as pd
columns_total = [int(i/2) if i%2==0 else "id"+str(int(i/2)) for i in range(10)]+["sym"+str(i) for i in range(5)]
index_total = [i for i in  range(5)]
df = pd.DataFrame(index=index_total,columns=columns_total)

The output I got is:

     0  id0    1  id1    2  id2    3  id3    4  id4 sym0 sym1 sym2 sym3 sym4
0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

The values of the columns arranged are not exactly how I was thinking. The order is:

0  id0    1  id1    2  id2    3  id3    4  id4 sym0 sym1 sym2 sym3 sym4  

Whereas the expected output is:

0  id0 sym0   1  id1 sym1   2  id2 sym2   3  id3 sym3   4  id4 sym4  

Please let me know how to correct the order.

Use list comprehension with flatten new values create in tuples with f-string s working in python 3.6+:

a = [item for x in range(5) for item in (x, f'id{x}', f'sim{x}')]
print (a)
[0, 'id0', 'sim0', 1, 'id1', 'sim1', 2, 'id2', 'sim2', 3, 'id3', 'sim3', 4, 'id4', 'sim4']

Solution under python 3.6 :

a = [item for x in range(5) for item in (x, 'id{}'.format(x), 'sim{}'.format(x))]
print (a)
[0, 'id0', 'sim0', 1, 'id1', 'sim1', 2, 'id2', 'sim2', 3, 'id3', 'sim3', 4, 'id4', 'sim4']

Performance for range(5) :

In [216]: %timeit sorted([str(int(i/2)) if i%2==0 else "id"+str(int(i/2)) for i in range(10)]+["sym"+str(i) for i in range(5)],key=lambda x: x[-1])
13.2 µs ± 328 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [217]: %timeit [item for x in range(5) for item in (x, f'id{x}', f'sim{x}')]
3.92 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [218]: %timeit [item for x in range(5) for item in (x, 'id{}'.format(x), 'sim{}'.format(x))]
5.15 µs ± 83.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You can simply add sorted when creating the list to order them by their last value, that is 0 will always precede 1 and so on:

columns_total = sorted([str(int(i/2)) if i%2==0 else "id"+str(int(i/2)) for i in range(10)]+["sym"+str(i) for i in range(5)],key=lambda x: x[-1])
print(columns_total)

Output:

['0', 'id0', 'sym0', '1', 'id1', 'sym1', '2', 'id2', 'sym2', '3', 'id3', 'sym3', '4', 'id4', 'sym4']   

Edit:

As Jezrael pointed out, this won't work when the number exceeds 10. This is the solution I came up with:

import re
columns_total = sorted([str(int(i/2)) if i%2==0 else "id"+str(int(i/2)) for i in range(500)]+["sym"+str(i) for i in range(250)],key=lambda x: int(re.findall('\d+',x)[0])) 
print(columns_total)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM