簡體   English   中英

計算 panadas dataframe 行內 Python 列表中元素的出現次數

[英]Count occurrences of elements in Python list inside panadas dataframe rows

我正在嘗試計算每行列表中每個字符串的出現次數。

+----+---------------------------+
| Id |           Col1            |
+----+---------------------------+
| N1 | ['a', 'b', 'c', 'a']      |
| N2 | ['b', 'b', 'b']           |
| N3 | []                        |
| N4 | ['a', 'b', 'c', 'a', 'c'] | 
| N5 | []                        |
+----+---------------------------+

結果我想得到這樣的東西:

+----+---------------------------+-----------------------+
| Id |           Col1            |         Col2          |
+----+---------------------------+-----------------------+
| N1 | ['a', 'b', 'c', 'a']      | {'a':2, 'b':1, 'c':1} |
| N2 | ['b', 'b', 'b']           | {'b':3}               |
| N3 | []                        | {} or None            |
| N4 | ['a', 'b', 'c', 'a', 'c'] | {'a':2, 'b':1, 'c':2} |
| N5 | []                        | {} or None            |
+----+---------------------------+-----------------------+

已經嘗試通過不同的方法使用 DataFrame 內部 collections 庫中的計數器,但似乎沒有任何效果。

d = {'Id': ['N1', 'N2', 'N3', 'N4', 'N5'], 
     'Col1': [['a', 'b', 'c', 'a'], ['b', 'b', 'b'], [], ['a', 'b', 'c', 'a', 'c'], []]}
df = pd.DataFrame(data = d)

使用Counter檢查下面的代碼:

import pandas as pd 

from collections import Counter

df['Col2'] = df.apply(lambda x: Counter(x['Col1']) ,axis=1)

df

Output:

在此處輸入圖像描述

單線怎么辦:

df['Col2] = df['Col1'].apply(lambda x: pd.Series(x).value_counts().to_dict())

輸出/輸出:

   Id             Col1                      Col2
0  N1     [a, b, c, a]  {'a': 2, 'b': 1, 'c': 1}
1  N2        [b, b, b]                  {'b': 3}
2  N3               []                        {}
3  N4  [a, b, c, a, c]  {'a': 2, 'c': 2, 'b': 1}
4  N5               []                        {}
​

很簡單:

from collections import Counter

df['col_2'] = df.Col1.map(Counter)

>>> df
'''
   Id             Col1                     col_2
0  N1     [a, b, c, a]  {'a': 2, 'b': 1, 'c': 1}
1  N2        [b, b, b]                  {'b': 3}
2  N3               []                        {}
3  N4  [a, b, c, a, c]  {'a': 2, 'b': 1, 'c': 2}
4  N5               []                        {}
from numpy import nan
import pandas as pd

解決方案:

col = []
for i, row in df.iterrows():
    col.append(
        {elem : row['Col1'].count(elem) for elem in set(row['Col1'])} # set removes duplicates
    )

df = df.join(pd.Series(col, name='Col2'))

替代解決方案:

def add_value_counts(col):
    col['Col2'] = pd.Series(col['Col1']).value_counts().to_dict()
    return col

df = df.T.apply(add_value_counts, axis=0).T # transposes df and iterates over rows

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM