I want to count the occurrence of items inside list present in column of a dataset. I have my tags column in the dataset. My dataset consists data in following format
tags
-----------
['symfony' 'assestic]
['java' 'containers' 'kubernetes']
['python' 'pelican']
['python' 'api' 'oath' 'python-requests']
['google-api' 'google-cloud-storage']
The list seems to be in string format too. I am not being able to convert the string into list without concatenating all the item inside the list.
#Checking the type of first 5 rows tags
for i,l in enumerate(df.tags):
print('list',i,'is class', type(l) )
if i ==4:
break
Output will be
list 0 is class <class 'str'>
list 1 is class <class 'str'>
list 2 is class <class 'str'>
list 3 is class <class 'str'>
list 4 is class <class 'str'>
I tried two methods for it Method 1:
def clean_tags_list(list_):
list_ = list_.replace("\"['" , '[')
list_ = list_.replace("']\"", ']')
list_ = list_.replace("'","")
return list_
df['tags'] = df['tags'].apply(clean_tags_list)
Output will be
tags
----------------------------------
[symfony assestic]
[java containers kubernetes]
[python pelican]
[pyton api oath python-requests]
[google-api google-cloud-storage]
But The Value counts doesnt work with the above Series. Value Counts will give following output
[symfony assestic] 1
[java containers kubernetes] 1
[python pelican] 1
[pyton api oath python-requests] 1
[google-api google-cloud-storage] 1
Method 2: I tried using replace, strip, asl.literal_eval().
Question How to achieve output in following format?
python 2
symfony 1
assestic 1
You can flatten the column so that each list element is in a separate row, then just use .value_counts()
. However since the data is actually strings that look like lists, you'll have to convert them to actual lists first.
Here's an example:
import ast
df = pd.DataFrame({
"tags": [
"['symfony', 'assestic']",
"['java', 'containers', 'kubernetes']",
"['python', 'pelican']",
"['python', 'api', 'oath', 'python-requests']",
"['google-api', 'google-cloud-storage']",
]
})
df["tags"]\
.apply(ast.literal_eval)\ # convert strings to lists
.apply(lambda x: pd.Series(x))\ # convert lists to series
.stack()\ # flatten the multiple series into a single series
.value_counts() # get value counts
With result:
python 2
java 1
oath 1
google-cloud-storage 1
api 1
assestic 1
kubernetes 1
pelican 1
symfony 1
python-requests 1
google-api 1
containers 1
Note that if the data you're working with is composed of lists rather than strings that look like lists, the approach is the same without the .apply(ast.literal_eval)
line.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.