简体   繁体   English

值 Pandas Dataframe 列中的项目计数,其中包含字符串列表作为值

[英]Value Counts of items inside a column in Pandas Dataframe which contains list of string as value

I want to count the occurrence of items inside list present in column of a dataset.我想计算数据集列中存在的列表中项目的出现次数。 I have my tags column in the dataset.我在数据集中有我的标签列。 My dataset consists data in following format我的数据集包含以下格式的数据

tags
-----------
['symfony' 'assestic]
['java' 'containers' 'kubernetes']
['python' 'pelican']
['python' 'api' 'oath' 'python-requests']
['google-api' 'google-cloud-storage']

The list seems to be in string format too.该列表似乎也是字符串格式。 I am not being able to convert the string into list without concatenating all the item inside the list.如果不连接列表中的所有项目,我无法将字符串转换为列表。

#Checking the type of first 5 rows tags
for i,l in enumerate(df.tags):
    print('list',i,'is class', type(l) )
    if i ==4:
        break

Output will be Output 将

list 0 is class <class 'str'>
list 1 is class <class 'str'>
list 2 is class <class 'str'>
list 3 is class <class 'str'>
list 4 is class <class 'str'>

I tried two methods for it Method 1:我尝试了两种方法方法1:

def clean_tags_list(list_):
    list_ = list_.replace("\"['" , '[')
    list_ = list_.replace("']\"", ']')
    list_ = list_.replace("'","")
    return list_
df['tags'] = df['tags'].apply(clean_tags_list)

Output will be Output 将

   tags                              
   ----------------------------------
   [symfony assestic]                 
   [java containers kubernetes]      
   [python pelican]                  
   [pyton api oath python-requests]   
   [google-api google-cloud-storage]  

But The Value counts doesnt work with the above Series.但价值计数不适用于上述系列。 Value Counts will give following output值计数将给出以下 output

[symfony assestic]                 1                
[java containers kubernetes]       1      
[python pelican]                   1                 
[pyton api oath python-requests]   1   
[google-api google-cloud-storage]  1

Method 2: I tried using replace, strip, asl.literal_eval().方法2:我尝试使用replace、strip、asl.literal_eval()。

Question How to achieve output in following format? Question如何实现output 格式如下?

python 2
symfony 1
assestic 1

You can flatten the column so that each list element is in a separate row, then just use .value_counts() .您可以展平列,以便每个列表元素位于单独的行中,然后只需使用.value_counts() However since the data is actually strings that look like lists, you'll have to convert them to actual lists first.但是,由于数据实际上是看起来像列表的字符串,因此您必须首先将它们转换为实际列表。

Here's an example:这是一个例子:

import ast

df = pd.DataFrame({
    "tags": [
        "['symfony', 'assestic']",
        "['java', 'containers', 'kubernetes']",
        "['python', 'pelican']",
        "['python', 'api', 'oath', 'python-requests']",
        "['google-api', 'google-cloud-storage']",
    ]
})

df["tags"]\
    .apply(ast.literal_eval)\ # convert strings to lists
    .apply(lambda x: pd.Series(x))\ # convert lists to series
    .stack()\ # flatten the multiple series into a single series
    .value_counts() # get value counts

With result:结果:

python                  2
java                    1
oath                    1
google-cloud-storage    1
api                     1
assestic                1
kubernetes              1
pelican                 1
symfony                 1
python-requests         1
google-api              1
containers              1

Note that if the data you're working with is composed of lists rather than strings that look like lists, the approach is the same without the .apply(ast.literal_eval) line.请注意,如果您正在使用的数据由列表而不是看起来像列表的字符串组成,则该方法与没有.apply(ast.literal_eval)行的方法相同。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 替换Python列表内的pandas数据框列中的字符串值 - Replacing string value in a pandas dataframe column inside a list in Python 检查 Pandas Dataframe 列中的哪个值是字符串 - Check which value in Pandas Dataframe Column is String 将特定值的计数存储在Pandas数据框中的一列中 - Store the counts for a specific value in a column in Pandas dataframe 在 Pandas 数据框中创建 value_counts 列 - Create column of value_counts in Pandas dataframe Pandas Dataframe - 如何检查A列中的字符串值是否在B列中的字符串项列表中可用 - Pandas Dataframe - How to check if the string value in column A is available in the list of string items in column B 如何在 Pandas DataFrame 中删除包含相同值的列 - How to drop a column in a Pandas DataFrame which contains the same value 有没有办法获取列表包含的与 Pandas Dataframe 中的值匹配的值? - Is there a way to get the value that the list contains which matched the values in Pandas Dataframe? 根据条件(包含字符串)替换 Pandas DataFrame 列中的值 - Replace value in Pandas DataFrame column, based on a condition (contains a string) Pandas 检查 dataframe 列是否包含列表中的值(不同长度) - Pandas check if dataframe column contains value from list (different lengths) pandas DataFrame 中的新列,它计算以下所有值的列值的出现次数 - New column in pandas DataFrame which counts occurrences of a column value for all values below
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM