简体   繁体   English

Python对pandas Dataframe中列表中的元素进行排序

[英]Python Sorting elements in a List in pandas Dataframe

My company requires me to upload data as a list with quotations around it and its not the best but it is what it is.我的公司要求我将数据作为列表上传,并附有报价单,这不是最好的,但它就是这样。 For Example if i have data that is 2 inches and 3 inches I have to upload it as ["2 in", "3 in"].例如,如果我有 2 英寸和 3 英寸的数据,我必须将其上传为 ["2 in", "3 in"]。

When I try and sort the elements in the list for each row I get this: [1, 2, , ", ", [, ], o, z] where it sorts each individual letter and number当我尝试对每一行的列表中的元素进行排序时,我得到了这个: [1, 2, , ", ", [, ], o, z] 其中它对每个单独的字母和数字进行排序

Example of the DF I am trying to sort:
d = {'col1': ['["3 oz","1 oz","2 oz"]', '["1.2 in","1 in","1.3 in"]', '["10 in","22 in","3.4 in"]']}
df = pd.DataFrame(data=d)

What I have tried:我尝试过的:

def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)
df['col1'].apply(lambda x: sorted_alphanumeric((x)))

and 

from natsort import natsorted
df['col1'].apply(lambda x: natsorted(x))

and

df['col1'].apply(lambda x: sorted(x))

I am sure its something simple that I am missing after staring at this for 2 days but if you have any idea on how to solve it I would appreciate it.我确定这是我盯着这个 2 天后遗漏的一些简单的东西,但如果您对如何解决它有任何想法,我将不胜感激。

Because you have strings, you first need to split the data into chunks.因为你有字符串,所以首先需要将数据拆分成块。 Fo this remove the first 2 and last 2 characters [" ad "] , then split on "," to get a list of the data.为此,删除前 2 个和最后 2 个字符[" ad "] ,然后拆分为","以获取数据列表。

Here is one way using apply:这是使用 apply 的一种方法:

from natsort import natsorted
(df['col1'].str[2:-2].str.split('","')
           .apply(lambda x: '["'+'","'.join(natsorted(x))+'"]')
)

output (as a Series):输出(作为一个系列):

0        ["1 oz","2 oz","3 oz"]
1    ["1 in","1.2 in","1.3 in"]
2    ["3.4 in","10 in","22 in"]
Name: col1, dtype: object

For explicitness, the items are strings: '["1 oz","2 oz","3 oz"]'为明确起见,项目是字符串: '["1 oz","2 oz","3 oz"]'

NB.注意。 this is purely sorting on the number first and then on the unit as alphanumeric string, it does not take into account the meaning of the units这纯粹是先按数字排序,然后按字母数字字符串按单位排序,它没有考虑单位的含义

You can use ast.literal_eval then sorting like below:您可以使用ast.literal_eval然后排序如下:

(Why using literal_eval and not using eval ) (为什么使用literal_eval而不是使用eval

>>> from ast import literal_eval
>>> df['col1'] = df['col1'].apply(lambda x: sorted(literal_eval(x)))
>>> df
           col1
0      [1 oz, 2 oz, 3 oz]
1      [1 in, 1.2 in, 1.3 in]
2      [10 in, 22 in, 3.4 in]

from natsort import natsorted
df['col1'] = df['col1'].apply(lambda x: natsorted(eval(x)))
print(df)
                     col1
0      [1 oz, 2 oz, 3 oz]
1  [1 in, 1.2 in, 1.3 in]
2  [3.4 in, 10 in, 22 in]

You can use eval to evaluate strings:您可以使用eval来评估字符串:

df['col1'].apply(lambda x: sorted(eval(x)))

However, in this way the lists are sorted in lexicographic order, so you have to write a more sophisticate function if you want them ordered by the numbers contained in them.但是,通过这种方式,列表按字典顺序排序,因此如果您希望它们按其中包含的数字排序,则必须编写更复杂的函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM