简体   繁体   中英

Creating a sorted list of unique numbers from a DataFrame

I am writing keywords and their corresponding page numbers via LaTeX into textfiles which i then process with Python. How can I create a sorted list of page numbers with their corresponding keyword?

The following code gives me the unique list however it is not sorted.

import pandas as pd

def unique(liste):
    a = liste.split(',')
    a = [int(numeric_string) for numeric_string in a]
    a = sorted(a)
    a = map(str,a)
    b = set(a)
    return ','.join(b)

df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]})
df['page'] = df['page'].astype(str)
print(df)

grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['unique'] = grouped['page'].apply(unique)
print(grouped)

produces

   keyword page
0      foo    1
1      foo    2
2      foo    3
3      foo    3
4      foo    4
5      foo    5
6      foo    6
7      foo    7
8      bar    7
9      bar    9
10     bar   10
  keyword             page         unique
0     bar           7,9,10         9,7,10
1     foo  1,2,3,3,4,5,6,7  3,7,6,4,5,2,1
import numpy as np
import pandas as pd

df = pd.DataFrame(
    {'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], 
     "page": [1,2,3,3,4,5,6,7,7,9,10]})

# df['page'] = df['page'].astype(int)
result = df.groupby(['keyword'])['page'].agg(lambda x: ','.join(np.unique(x).astype(str)))

print(result)

yields

keyword
bar           7,9,10
foo    1,2,3,4,5,6,7
Name: page, dtype: object

  • np.unique returns a unique sorted array of values. We want the page values to be sorted as ints (not as strings) so keep page values as ints. After calling np.unique you can use astype(str) to convert to strings and then join them with ','.join .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM