简体   繁体   中英

Python Sorting elements in a List in pandas Dataframe

My company requires me to upload data as a list with quotations around it and its not the best but it is what it is. For Example if i have data that is 2 inches and 3 inches I have to upload it as ["2 in", "3 in"].

When I try and sort the elements in the list for each row I get this: [1, 2, , ", ", [, ], o, z] where it sorts each individual letter and number

Example of the DF I am trying to sort:
d = {'col1': ['["3 oz","1 oz","2 oz"]', '["1.2 in","1 in","1.3 in"]', '["10 in","22 in","3.4 in"]']}
df = pd.DataFrame(data=d)

What I have tried:

def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)
df['col1'].apply(lambda x: sorted_alphanumeric((x)))

and 

from natsort import natsorted
df['col1'].apply(lambda x: natsorted(x))

and

df['col1'].apply(lambda x: sorted(x))

I am sure its something simple that I am missing after staring at this for 2 days but if you have any idea on how to solve it I would appreciate it.

Because you have strings, you first need to split the data into chunks. Fo this remove the first 2 and last 2 characters [" ad "] , then split on "," to get a list of the data.

Here is one way using apply:

from natsort import natsorted
(df['col1'].str[2:-2].str.split('","')
           .apply(lambda x: '["'+'","'.join(natsorted(x))+'"]')
)

output (as a Series):

0        ["1 oz","2 oz","3 oz"]
1    ["1 in","1.2 in","1.3 in"]
2    ["3.4 in","10 in","22 in"]
Name: col1, dtype: object

For explicitness, the items are strings: '["1 oz","2 oz","3 oz"]'

NB. this is purely sorting on the number first and then on the unit as alphanumeric string, it does not take into account the meaning of the units

You can use ast.literal_eval then sorting like below:

(Why using literal_eval and not using eval )

>>> from ast import literal_eval
>>> df['col1'] = df['col1'].apply(lambda x: sorted(literal_eval(x)))
>>> df
           col1
0      [1 oz, 2 oz, 3 oz]
1      [1 in, 1.2 in, 1.3 in]
2      [10 in, 22 in, 3.4 in]

from natsort import natsorted
df['col1'] = df['col1'].apply(lambda x: natsorted(eval(x)))
print(df)
                     col1
0      [1 oz, 2 oz, 3 oz]
1  [1 in, 1.2 in, 1.3 in]
2  [3.4 in, 10 in, 22 in]

You can use eval to evaluate strings:

df['col1'].apply(lambda x: sorted(eval(x)))

However, in this way the lists are sorted in lexicographic order, so you have to write a more sophisticate function if you want them ordered by the numbers contained in them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM