简体   繁体   English

计算 object 在 DataFrame 列的列表中出现的次数

[英]Count how many times an object occurs in a list of a list within a DataFrame column

Say I have a DataFrame pd with a column called 'elements' which contains a list of a list of objects as shown below:假设我有一个 DataFrame pd,其中包含一个名为“元素”的列,其中包含对象列表的列表,如下所示:

print(df2['elements'])

0       [Element B, Element Cr, Element Re]
1       [Element B, Element Rh, Element Sc]
2        [Element B, Element Mo, Element Y]
3       [Element Al, Element B, Element Lu]
4       [Element B, Element Dy, Element Os]

Name: elements, Length: 1763, dtype: object

I would like to count how many times each string within the whole column, so like in the example above the count for the string 'Element B' is 5 or for 'Element Mo' is 1.我想计算整个列中每个字符串的次数,所以就像在上面的示例中,字符串“元素 B”的计数为 5 或“元素 Mo”的计数为 1。

I have tried setting up a dictionary below, but this just counts the each list instead of the strings that are in them.我尝试在下面设置一个字典,但这只是计算每个列表而不是其中的字符串。

elements_count_dict = {}
for entry in df2['elements']:
    for object in entry:
        if object in elements_count_dict:
            elements_count_dict[object] += 1
        else:
            elements_count_dict[object] = 0

However, doing it this way the dictionary of tracks each individual character instead of the strings ie [ = 5 and ] = 5 and even after converting the column to string using df2['elements']to_string() it still doesn't work.但是,这样做的字典跟踪每个单独的字符而不是字符串,即 [ = 5 和 ] = 5,即使在使用df2['elements']to_string()将列转换为字符串之后,它仍然不起作用。

First use np.ravel to flatten the nested list coming from df.elements.to_list and then use collections.Counter instead of a loop:首先使用np.ravel展平来自df.elements.to_list的嵌套列表,然后使用collections.Counter而不是循环:

import numpy as np
from collections import Counter

ravel = np.ravel(df.elements.to_list())
Counter(ravel)

Try as follows:尝试如下:

  1. First, use Series.replace to replace [ and ] in your strings with '' (regex: r\[|\] ).首先,使用Series.replace将字符串中的[]替换为'' (正则表达式: r\[|\] )。
  2. Second, use Series.str.split to split the string on , (ie ,\s ).其次,使用Series.str.split,上拆分字符串(即,\s )。
  3. Third, use Series.explode to put each item on its own row.第三,使用Series.explode将每个项目放在自己的行上。
  4. Finally, apply Series.value_counts to return a pd.Series with a count for each item in your lists (in order of appearance).最后,应用Series.value_counts以返回一个pd.Series ,其中包含列表中每个项目的计数(按出现顺序)。
import pandas as pd

data = {'elements': {0: '[Element B, Element Cr, Element Re]',
  1: '[Element B, Element Rh, Element Sc]',
  2: '[Element B, Element Mo, Element Y]',
  3: '[Element Al, Element B, Element Lu]',
  4: '[Element B, Element Dy, Element Os]'}}

df = pd.DataFrame(data)

counts = df.elements.str.replace(r'\[|\]','', regex=True)\
    .str.split(',\s').explode().value_counts()

print(counts)

Element B     5
Element Cr    1
Element Re    1
Element Rh    1
Element Sc    1
Element Mo    1
Element Y     1
Element Al    1
Element Lu    1
Element Dy    1
Element Os    1
Name: elements, dtype: int64

# to turn that into a dict, simply use:
# d = counts.to_dict()

Here is one way to do it这是一种方法

may not be very elegant, but works可能不是很优雅,但有效

df['col1'].str.replace(r'\[|\]','', regex=True).str.split(',').explode().str.strip().to_frame().groupby('col1').value_counts()
col1
Element Al    1
Element B     5
Element Cr    1
Element Dy    1
Element Lu    1
Element Mo    1
Element Os    1
Element Re    1
Element Rh    1
Element Sc    1
Element Y     1
dtype: int64

Data Used使用的数据

data={'col1': {0: '[Element B, Element Cr, Element Re]',
  1: '[Element B, Element Rh, Element Sc]',
  2: '[Element B, Element Mo, Element Y]',
  3: '[Element Al, Element B, Element Lu]',
  4: '[Element B, Element Dy, Element Os]'}}
df=pd.DataFrame(data)
df

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何分析python数据帧并计算字符串在列中出现的次数? - How to analyze python dataframe and to count how many times a string occurs in a column? Pyspark 计算一个项目在 dataframe 中不同日期出现的次数 - Pyspark count how many times a item occurs in different dates in a dataframe 如何计算子列表中某个特定模式在列表中出现的次数,然后将该计数追加到子列表中? - How to count the number of times a certain pattern in a sublist occurs within a list and then append that count to the sublist? 计算列表中出现了多少组合 - count how many a combination occurs in a list 如果 dataframe 的列表中出现单词,如何计算? - how to count if a word occurs in a list for a dataframe? Pandas Python 计算一列包含数字列表的次数 - Pandas Python Count how many times a column contains a list of numbers 计算字符串在特定列中出现的次数 - Count how many times a string occurs in a specific column 如何计算项目在另一个列表的列表中出现的次数 - How to count the number of times an item occurs in a list base on another list 计算列表中每个项目在 Pandas 数据框列中出现的次数,用逗号将值与其他列的附加聚合分开 - Count number of times each item in list occurs in a pandas dataframe column with comma separates values with additional aggregation of other columns 计算数据框列中列表中单词的出现次数 - Count occurences of word within a list in a dataframe column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM