繁体   English   中英

如何获得 pandas 系列值在值偏好之后以原始索引顺序的系列计数

[英]How to get pandas series Value counts with a series in original index order after value preference

下面我举个例子:

a = ['Ibrutinib', 'Ibrutinib', 'Ibrutinib',
       'Ibrutinib-containing product', 'Ibrutinib 140 MG',
       'Ibrutinib Oral Product',
       'Ibrutinib-containing product in oral dose form', 'Ibrutinib Pill',
       'Ibrutinib Oral Capsule', 'Ibrutinib 140 MG Oral Capsule',
       'Ibrutinib 140 MG [Imbruvica]',
       'Ibrutinib Oral Capsule [Imbruvica]',
       'Ibrutinib 140 MG Oral Capsule [Imbruvica]']

pd.Series(a).value_counts()

%%out%%
Ibrutinib                                         3
Ibrutinib-containing product in oral dose form    1
Ibrutinib Pill                                    1
Ibrutinib Oral Product                            1
Ibrutinib 140 MG Oral Capsule [Imbruvica]         1
Ibrutinib 140 MG Oral Capsule                     1
Ibrutinib Oral Capsule                            1
Ibrutinib-containing product                      1
Ibrutinib 140 MG [Imbruvica]                      1
Ibrutinib 140 MG                                  1
Ibrutinib Oral Capsule [Imbruvica]                1
dtype: int64

我想在 3 position 中看到“Ibrutinib 140 MG”,因为它在原始系列中领先。

要按原始列表排序,请将其转换为 dataframe,然后创建一个排名列作为排序依据。

import pandas as pd

a = ['Ibrutinib', 'Ibrutinib', 'Ibrutinib',
       'Ibrutinib-containing product', 'Ibrutinib 140 MG',
       'Ibrutinib Oral Product',
       'Ibrutinib-containing product in oral dose form', 'Ibrutinib Pill',
       'Ibrutinib Oral Capsule', 'Ibrutinib 140 MG Oral Capsule',
       'Ibrutinib 140 MG [Imbruvica]',
       'Ibrutinib Oral Capsule [Imbruvica]',
       'Ibrutinib 140 MG Oral Capsule [Imbruvica]']


s = pd.Series(a).value_counts()
df = s.rename_axis('value').reset_index(name='count')   # convert to dataframe
df["rank"] = df['value'].apply(lambda x : a.index(x))   # create rank column, ranked by list index 
dfsrt = df.sort_values(by='rank')                       # sort by rank
print(dfsrt[['value','count']].to_string(index=False, justify='left',  # display value and count
     formatters={'value':'{{:<{}s}}'.format(dfsrt['value'].str.len().max()).format}))

Output

 value                                           count
 Ibrutinib                                       3
 Ibrutinib-containing product                    1
 Ibrutinib 140 MG                                1
 Ibrutinib Oral Product                          1
 Ibrutinib-containing product in oral dose form  1
 Ibrutinib Pill                                  1
 Ibrutinib Oral Capsule                          1
 Ibrutinib 140 MG Oral Capsule                   1
 Ibrutinib 140 MG [Imbruvica]                    1
 Ibrutinib Oral Capsule [Imbruvica]              1
 Ibrutinib 140 MG Oral Capsule [Imbruvica]       1

尝试

df = pd.Dataframe(a)
df = df.groupby(0, sort=False).size()\
    .sort_values('size', ascending=False, kind='mergesort')

Value_counts 默认进行快速排序,不能保证稳定。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM