[英]Count how many times an object occurs in a list of a list within a DataFrame column
Say I have a DataFrame pd with a column called 'elements' which contains a list of a list of objects as shown below:假设我有一个 DataFrame pd,其中包含一个名为“元素”的列,其中包含对象列表的列表,如下所示:
print(df2['elements'])
0 [Element B, Element Cr, Element Re]
1 [Element B, Element Rh, Element Sc]
2 [Element B, Element Mo, Element Y]
3 [Element Al, Element B, Element Lu]
4 [Element B, Element Dy, Element Os]
Name: elements, Length: 1763, dtype: object
I would like to count how many times each string within the whole column, so like in the example above the count for the string 'Element B' is 5 or for 'Element Mo' is 1.我想计算整个列中每个字符串的次数,所以就像在上面的示例中,字符串“元素 B”的计数为 5 或“元素 Mo”的计数为 1。
I have tried setting up a dictionary below, but this just counts the each list instead of the strings that are in them.我尝试在下面设置一个字典,但这只是计算每个列表而不是其中的字符串。
elements_count_dict = {}
for entry in df2['elements']:
for object in entry:
if object in elements_count_dict:
elements_count_dict[object] += 1
else:
elements_count_dict[object] = 0
However, doing it this way the dictionary of tracks each individual character instead of the strings ie [ = 5 and ] = 5 and even after converting the column to string using df2['elements']to_string()
it still doesn't work.但是,这样做的字典跟踪每个单独的字符而不是字符串,即 [ = 5 和 ] = 5,即使在使用df2['elements']to_string()
将列转换为字符串之后,它仍然不起作用。
First use np.ravel
to flatten the nested list coming from df.elements.to_list
and then use collections.Counter
instead of a loop:首先使用np.ravel
展平来自df.elements.to_list
的嵌套列表,然后使用collections.Counter
而不是循环:
import numpy as np
from collections import Counter
ravel = np.ravel(df.elements.to_list())
Counter(ravel)
Try as follows:尝试如下:
Series.replace
to replace [
and ]
in your strings with ''
(regex: r\[|\]
).首先,使用Series.replace
将字符串中的[
和]
替换为''
(正则表达式: r\[|\]
)。Series.str.split
to split the string on ,
(ie ,\s
).其次,使用Series.str.split
在,
上拆分字符串(即,\s
)。Series.explode
to put each item on its own row.第三,使用Series.explode
将每个项目放在自己的行上。Series.value_counts
to return a pd.Series
with a count for each item in your lists (in order of appearance).最后,应用Series.value_counts
以返回一个pd.Series
,其中包含列表中每个项目的计数(按出现顺序)。import pandas as pd
data = {'elements': {0: '[Element B, Element Cr, Element Re]',
1: '[Element B, Element Rh, Element Sc]',
2: '[Element B, Element Mo, Element Y]',
3: '[Element Al, Element B, Element Lu]',
4: '[Element B, Element Dy, Element Os]'}}
df = pd.DataFrame(data)
counts = df.elements.str.replace(r'\[|\]','', regex=True)\
.str.split(',\s').explode().value_counts()
print(counts)
Element B 5
Element Cr 1
Element Re 1
Element Rh 1
Element Sc 1
Element Mo 1
Element Y 1
Element Al 1
Element Lu 1
Element Dy 1
Element Os 1
Name: elements, dtype: int64
# to turn that into a dict, simply use:
# d = counts.to_dict()
Here is one way to do it这是一种方法
may not be very elegant, but works可能不是很优雅,但有效
df['col1'].str.replace(r'\[|\]','', regex=True).str.split(',').explode().str.strip().to_frame().groupby('col1').value_counts()
col1
Element Al 1
Element B 5
Element Cr 1
Element Dy 1
Element Lu 1
Element Mo 1
Element Os 1
Element Re 1
Element Rh 1
Element Sc 1
Element Y 1
dtype: int64
Data Used使用的数据
data={'col1': {0: '[Element B, Element Cr, Element Re]',
1: '[Element B, Element Rh, Element Sc]',
2: '[Element B, Element Mo, Element Y]',
3: '[Element Al, Element B, Element Lu]',
4: '[Element B, Element Dy, Element Os]'}}
df=pd.DataFrame(data)
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.