计算 object 在 DataFrame 列的列表中出现的次数

Question

Say I have a DataFrame pd with a column called 'elements' which contains a list of a list of objects as shown below:假设我有一个 DataFrame pd，其中包含一个名为“元素”的列，其中包含对象列表的列表，如下所示：

print(df2['elements'])

0       [Element B, Element Cr, Element Re]
1       [Element B, Element Rh, Element Sc]
2        [Element B, Element Mo, Element Y]
3       [Element Al, Element B, Element Lu]
4       [Element B, Element Dy, Element Os]

Name: elements, Length: 1763, dtype: object

I would like to count how many times each string within the whole column, so like in the example above the count for the string 'Element B' is 5 or for 'Element Mo' is 1.我想计算整个列中每个字符串的次数，所以就像在上面的示例中，字符串“元素 B”的计数为 5 或“元素 Mo”的计数为 1。

I have tried setting up a dictionary below, but this just counts the each list instead of the strings that are in them.我尝试在下面设置一个字典，但这只是计算每个列表而不是其中的字符串。

elements_count_dict = {}
for entry in df2['elements']:
    for object in entry:
        if object in elements_count_dict:
            elements_count_dict[object] += 1
        else:
            elements_count_dict[object] = 0

However, doing it this way the dictionary of tracks each individual character instead of the strings ie [ = 5 and ] = 5 and even after converting the column to string using df2['elements']to_string() it still doesn't work.但是，这样做的字典跟踪每个单独的字符而不是字符串，即 [ = 5 和 ] = 5，即使在使用df2['elements']to_string()将列转换为字符串之后，它仍然不起作用。

Answer 1

First use np.ravel to flatten the nested list coming from df.elements.to_list and then use collections.Counter instead of a loop:首先使用np.ravel展平来自df.elements.to_list的嵌套列表，然后使用collections.Counter而不是循环：

import numpy as np
from collections import Counter

ravel = np.ravel(df.elements.to_list())
Counter(ravel)

Answer 2

Try as follows:尝试如下：

First, use Series.replace to replace [ and ] in your strings with '' (regex: r\[|\] ).首先，使用Series.replace将字符串中的[和]替换为'' （正则表达式： r\[|\] ）。
Second, use Series.str.split to split the string on , (ie ,\s ).其次，使用Series.str.split在,上拆分字符串（即,\s ）。
Third, use Series.explode to put each item on its own row.第三，使用Series.explode将每个项目放在自己的行上。
Finally, apply Series.value_counts to return a pd.Series with a count for each item in your lists (in order of appearance).最后，应用Series.value_counts以返回一个pd.Series ，其中包含列表中每个项目的计数（按出现顺序）。

import pandas as pd

data = {'elements': {0: '[Element B, Element Cr, Element Re]',
  1: '[Element B, Element Rh, Element Sc]',
  2: '[Element B, Element Mo, Element Y]',
  3: '[Element Al, Element B, Element Lu]',
  4: '[Element B, Element Dy, Element Os]'}}

df = pd.DataFrame(data)

counts = df.elements.str.replace(r'\[|\]','', regex=True)\
    .str.split(',\s').explode().value_counts()

print(counts)

Element B     5
Element Cr    1
Element Re    1
Element Rh    1
Element Sc    1
Element Mo    1
Element Y     1
Element Al    1
Element Lu    1
Element Dy    1
Element Os    1
Name: elements, dtype: int64

# to turn that into a dict, simply use:
# d = counts.to_dict()

Answer 3

Here is one way to do it这是一种方法

may not be very elegant, but works可能不是很优雅，但有效

df['col1'].str.replace(r'\[|\]','', regex=True).str.split(',').explode().str.strip().to_frame().groupby('col1').value_counts()

col1
Element Al    1
Element B     5
Element Cr    1
Element Dy    1
Element Lu    1
Element Mo    1
Element Os    1
Element Re    1
Element Rh    1
Element Sc    1
Element Y     1
dtype: int64

Data Used使用的数据

data={'col1': {0: '[Element B, Element Cr, Element Re]',
  1: '[Element B, Element Rh, Element Sc]',
  2: '[Element B, Element Mo, Element Y]',
  3: '[Element Al, Element B, Element Lu]',
  4: '[Element B, Element Dy, Element Os]'}}
df=pd.DataFrame(data)
df

计算 object 在 DataFrame 列的列表中出现的次数

问题描述

3 个解决方案

解决方案1
1 2022-08-31 21:46:42

解决方案2
1 已采纳 2022-08-31 21:59:43

解决方案3
0 2022-08-31 21:52:30

计算 object 在 DataFrame 列的列表中出现的次数

问题描述

3 个解决方案

解决方案1 1 2022-08-31 21:46:42

解决方案2 1 已采纳 2022-08-31 21:59:43

解决方案3 0 2022-08-31 21:52:30

解决方案1
1 2022-08-31 21:46:42

解决方案2
1 已采纳 2022-08-31 21:59:43

解决方案3
0 2022-08-31 21:52:30