对 Pandas 列中的一串数字进行排序

Question

I've created previously a python script that creates an author index.我之前创建了一个创建作者索引的 python 脚本。
To spare you the details, (since extracting text from a pdf was pretty hard) I created a minimal reproducible example.为了省去细节，（因为从 pdf 中提取文本非常困难）我创建了一个最小的可重现示例。 My current status is I get a new line for each author and a comma separated list of the pages on which the author appears.我目前的状态是为每个作者换行，并以逗号分隔作者所在页面的列表。 However I would like to sort the list of pages in ascending manner.但是我想以升序方式对页面列表进行排序。

import pandas as pd
import csv
words = ["Autor1","Max Mustermann","Max Mustermann","Autor1","Bertha Musterfrau","Author2"]
pages = [15,13,5,1,17,20]
str_pages = list(map(str, pages))
df = pd.DataFrame({"Autor":words,"Pages":str_pages})
df = df.drop_duplicates().sort_values(by="Autor").reset_index(drop=True)
df = df.groupby("Autor")['Pages'].apply(lambda x: ','.join(x)).reset_index()
df

This produces the desired output (except the sorting of the pages).这将产生所需的 output（页面排序除外）。

               Autor Pages
0            Author2    20
1             Autor1  15,1
2  Bertha Musterfrau    17
3     Max Mustermann  13,5

I tried to vectorize the Pages column to string, split by the comma and applied a lambda function that is supposed to sort the resulting list.我尝试将Pages列矢量化为字符串，用逗号分隔并应用 lambda function 应该对结果列表进行排序。

df["Pages"] = df["Pages"].str.split(",").apply(lambda x: sorted(x))
df

However this only worked for Autor1 but not for Max Mustermann .然而，这只适用于Autor1但不适用于Max Mustermann 。 I cant seem to figure out why this is the case我似乎无法弄清楚为什么会这样

               Autor    Pages
0            Author2     [20]
1             Autor1  [1, 15]
2  Bertha Musterfrau     [17]
3     Max Mustermann  [13, 5]

Answer 1

str.split returns lists of strings. str.split返回字符串列表。 So lambda x: sorted(x) still sort by strings, not integers.所以lambda x: sorted(x)仍然按字符串排序，而不是整数。

You can try:你可以试试：

df['Pages'] = (df.Pages.str.split(',')
   .explode().astype(int)
   .sort_values()
   .groupby(level=0).agg(list)
)

Output: Output：

               Autor    Pages
0            Author2     [20]
1             Autor1  [1, 15]
2  Bertha Musterfrau     [17]
3     Max Mustermann  [5, 13]

Answer 2

If you want to use your existing approach,如果您想使用现有的方法，

df.Pages = (
    df.Pages.str.split(",")
        .apply(lambda x: sorted(x, key=lambda x: int(x)))
)

               Autor    Pages
0            Author2     [20]
1             Autor1  [1, 15]
2  Bertha Musterfrau     [17]
3     Max Mustermann  [5, 13]

对 Pandas 列中的一串数字进行排序

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-07-31 12:26:32

解决方案2
2 2020-07-31 13:01:19

对 Pandas 列中的一串数字进行排序

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-07-31 12:26:32

解决方案2 2 2020-07-31 13:01:19

解决方案1
3 已采纳 2020-07-31 12:26:32

解决方案2
2 2020-07-31 13:01:19