[英]Creating a sorted list of unique numbers from a DataFrame
我通过LaTeX将关键字及其对应的页码写入文本文件,然后使用Python处理。 如何创建带有相应关键字的页码排序列表?
以下代码为我提供了唯一列表,但未排序。
import pandas as pd
def unique(liste):
a = liste.split(',')
a = [int(numeric_string) for numeric_string in a]
a = sorted(a)
a = map(str,a)
b = set(a)
return ','.join(b)
df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]})
df['page'] = df['page'].astype(str)
print(df)
grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['unique'] = grouped['page'].apply(unique)
print(grouped)
产生
keyword page
0 foo 1
1 foo 2
2 foo 3
3 foo 3
4 foo 4
5 foo 5
6 foo 6
7 foo 7
8 bar 7
9 bar 9
10 bar 10
keyword page unique
0 bar 7,9,10 9,7,10
1 foo 1,2,3,3,4,5,6,7 3,7,6,4,5,2,1
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"],
"page": [1,2,3,3,4,5,6,7,7,9,10]})
# df['page'] = df['page'].astype(int)
result = df.groupby(['keyword'])['page'].agg(lambda x: ','.join(np.unique(x).astype(str)))
print(result)
产量
keyword
bar 7,9,10
foo 1,2,3,4,5,6,7
Name: page, dtype: object
np.unique
返回值的唯一排序数组。 我们希望页面值以整数(而不是字符串)进行排序,因此将page
值保持为整数。 调用np.unique
,可以使用astype(str)
转换为字符串,然后将其与','.join
astype(str)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.