I've created previously a python script that creates an author index.
To spare you the details, (since extracting text from a pdf was pretty hard) I created a minimal reproducible example. My current status is I get a new line for each author and a comma separated list of the pages on which the author appears. However I would like to sort the list of pages in ascending manner.
import pandas as pd
import csv
words = ["Autor1","Max Mustermann","Max Mustermann","Autor1","Bertha Musterfrau","Author2"]
pages = [15,13,5,1,17,20]
str_pages = list(map(str, pages))
df = pd.DataFrame({"Autor":words,"Pages":str_pages})
df = df.drop_duplicates().sort_values(by="Autor").reset_index(drop=True)
df = df.groupby("Autor")['Pages'].apply(lambda x: ','.join(x)).reset_index()
df
This produces the desired output (except the sorting of the pages).
Autor Pages
0 Author2 20
1 Autor1 15,1
2 Bertha Musterfrau 17
3 Max Mustermann 13,5
I tried to vectorize the Pages
column to string, split by the comma and applied a lambda function that is supposed to sort the resulting list.
df["Pages"] = df["Pages"].str.split(",").apply(lambda x: sorted(x))
df
However this only worked for Autor1
but not for Max Mustermann
. I cant seem to figure out why this is the case
Autor Pages
0 Author2 [20]
1 Autor1 [1, 15]
2 Bertha Musterfrau [17]
3 Max Mustermann [13, 5]
str.split
returns lists of strings. So lambda x: sorted(x)
still sort by strings, not integers.
You can try:
df['Pages'] = (df.Pages.str.split(',')
.explode().astype(int)
.sort_values()
.groupby(level=0).agg(list)
)
Output:
Autor Pages
0 Author2 [20]
1 Autor1 [1, 15]
2 Bertha Musterfrau [17]
3 Max Mustermann [5, 13]
If you want to use your existing approach,
df.Pages = (
df.Pages.str.split(",")
.apply(lambda x: sorted(x, key=lambda x: int(x)))
)
Autor Pages
0 Author2 [20]
1 Autor1 [1, 15]
2 Bertha Musterfrau [17]
3 Max Mustermann [5, 13]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.