繁体   English   中英

从DataFrame创建唯一数字的排序列表

[英]Creating a sorted list of unique numbers from a DataFrame

我通过LaTeX将关键字及其对应的页码写入文本文件,然后使用Python处理。 如何创建带有相应关键字的页码排序列表?

以下代码为我提供了唯一列表,但未排序。

import pandas as pd

def unique(liste):
    a = liste.split(',')
    a = [int(numeric_string) for numeric_string in a]
    a = sorted(a)
    a = map(str,a)
    b = set(a)
    return ','.join(b)

df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]})
df['page'] = df['page'].astype(str)
print(df)

grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['unique'] = grouped['page'].apply(unique)
print(grouped)

产生

   keyword page
0      foo    1
1      foo    2
2      foo    3
3      foo    3
4      foo    4
5      foo    5
6      foo    6
7      foo    7
8      bar    7
9      bar    9
10     bar   10
  keyword             page         unique
0     bar           7,9,10         9,7,10
1     foo  1,2,3,3,4,5,6,7  3,7,6,4,5,2,1
import numpy as np
import pandas as pd

df = pd.DataFrame(
    {'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], 
     "page": [1,2,3,3,4,5,6,7,7,9,10]})

# df['page'] = df['page'].astype(int)
result = df.groupby(['keyword'])['page'].agg(lambda x: ','.join(np.unique(x).astype(str)))

print(result)

产量

keyword
bar           7,9,10
foo    1,2,3,4,5,6,7
Name: page, dtype: object

  • np.unique返回值的唯一排序数组。 我们希望页面值以整数(而不是字符串)进行排序,因此将page值保持为整数。 调用np.unique ,可以使用astype(str)转换为字符串,然后将其与','.join astype(str)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM