Python对pandas Dataframe中列表中的元素进行排序

Question

My company requires me to upload data as a list with quotations around it and its not the best but it is what it is.我的公司要求我将数据作为列表上传，并附有报价单，这不是最好的，但它就是这样。 For Example if i have data that is 2 inches and 3 inches I have to upload it as ["2 in", "3 in"].例如，如果我有 2 英寸和 3 英寸的数据，我必须将其上传为 ["2 in", "3 in"]。

When I try and sort the elements in the list for each row I get this: [1, 2, , ", ", [, ], o, z] where it sorts each individual letter and number当我尝试对每一行的列表中的元素进行排序时，我得到了这个： [1, 2, , ", ", [, ], o, z] 其中它对每个单独的字母和数字进行排序

Example of the DF I am trying to sort:
d = {'col1': ['["3 oz","1 oz","2 oz"]', '["1.2 in","1 in","1.3 in"]', '["10 in","22 in","3.4 in"]']}
df = pd.DataFrame(data=d)

What I have tried:我尝试过的：

def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)
df['col1'].apply(lambda x: sorted_alphanumeric((x)))

and 

from natsort import natsorted
df['col1'].apply(lambda x: natsorted(x))

and

df['col1'].apply(lambda x: sorted(x))

I am sure its something simple that I am missing after staring at this for 2 days but if you have any idea on how to solve it I would appreciate it.我确定这是我盯着这个 2 天后遗漏的一些简单的东西，但如果您对如何解决它有任何想法，我将不胜感激。

Answer 1

Because you have strings, you first need to split the data into chunks.因为你有字符串，所以首先需要将数据拆分成块。 Fo this remove the first 2 and last 2 characters [" ad "] , then split on "," to get a list of the data.为此，删除前 2 个和最后 2 个字符[" ad "] ，然后拆分为","以获取数据列表。

Here is one way using apply:这是使用 apply 的一种方法：

from natsort import natsorted
(df['col1'].str[2:-2].str.split('","')
           .apply(lambda x: '["'+'","'.join(natsorted(x))+'"]')
)

output (as a Series):输出（作为一个系列）：

0        ["1 oz","2 oz","3 oz"]
1    ["1 in","1.2 in","1.3 in"]
2    ["3.4 in","10 in","22 in"]
Name: col1, dtype: object

For explicitness, the items are strings: '["1 oz","2 oz","3 oz"]'为明确起见，项目是字符串： '["1 oz","2 oz","3 oz"]'

NB.注意。 this is purely sorting on the number first and then on the unit as alphanumeric string, it does not take into account the meaning of the units这纯粹是先按数字排序，然后按字母数字字符串按单位排序，它没有考虑单位的含义

Answer 2

You can use ast.literal_eval then sorting like below:您可以使用ast.literal_eval然后排序如下：

(Why using literal_eval and not using eval ) （为什么使用literal_eval而不是使用eval ）

>>> from ast import literal_eval
>>> df['col1'] = df['col1'].apply(lambda x: sorted(literal_eval(x)))
>>> df
           col1
0      [1 oz, 2 oz, 3 oz]
1      [1 in, 1.2 in, 1.3 in]
2      [10 in, 22 in, 3.4 in]

Answer 3

from natsort import natsorted
df['col1'] = df['col1'].apply(lambda x: natsorted(eval(x)))
print(df)

                     col1
0      [1 oz, 2 oz, 3 oz]
1  [1 in, 1.2 in, 1.3 in]
2  [3.4 in, 10 in, 22 in]

Answer 4

You can use eval to evaluate strings:您可以使用eval来评估字符串：

df['col1'].apply(lambda x: sorted(eval(x)))

However, in this way the lists are sorted in lexicographic order, so you have to write a more sophisticate function if you want them ordered by the numbers contained in them.但是，通过这种方式，列表按字典顺序排序，因此如果您希望它们按其中包含的数字排序，则必须编写更复杂的函数。

Python对pandas Dataframe中列表中的元素进行排序

问题描述

4 个解决方案

解决方案1
2 已采纳 2021-11-05 15:59:21

解决方案2
1 2021-11-05 16:05:12

解决方案3
1 2021-11-05 16:05:50

解决方案4
1 2021-11-05 16:06:20

Python对pandas Dataframe中列表中的元素进行排序

问题描述

4 个解决方案

解决方案1 2 已采纳 2021-11-05 15:59:21

解决方案2 1 2021-11-05 16:05:12

解决方案3 1 2021-11-05 16:05:50

解决方案4 1 2021-11-05 16:06:20

解决方案1
2 已采纳 2021-11-05 15:59:21

解决方案2
1 2021-11-05 16:05:12

解决方案3
1 2021-11-05 16:05:50

解决方案4
1 2021-11-05 16:06:20