Title_1 Title_2 Type
He heard it from space A quick story about sounds from space Fiction
The end of all time A sad poem about the end of time Non-Fiction
The perfect beginning A story about friendship Non-Fiction
I am trying to count all the Fiction, Non-Fiction Types and count the number of words in Title_1 and Title_2 for the corresponding Types.
My desired output would be:
Type Count Num-Words
Non-Fiction 2 20
Fiction 1 12
This is what I have so far:
fopen = open(file_name, 'r')
fhand = csv.reader(fopen)
next(fhand)
category_sum = dict()
for row in fhand:
col_0=len(row[0].split())
col_1=len(row[1].split())
print( col_1 + col_1)
if row[2] in category_sum.keys():
category_sum[row[2]]+=1
else:
category_sum[row[2]]=1
I can get the total for the types in a nice dictionary, but I can't seem to figure out how to assign the word count to the appropriate type as a value in the dictionary.
Any ideas?
You can save dictionary as the value where one of the keys would be Count
and the other Num-Words
. So you dictionary value assignment may look like:
# num_of_words =
if row[2] in category_sum.keys():
category_sum[row[2]]['Count']+=1
category_sum[row[2]]['Num-Words']+=num_of_words
else:
category_sum[row[2]]={}
You could do it like this:
import csv
file_name = 'book_titles.csv'
with open(file_name, 'r', newline='') as fopen:
reader = csv.reader(fopen)
next(reader) # Skip header.
category_sum = {}
for row in reader:
category_sum[row[2]] = category_sum.get(row[2], 0) + 1
print(category_sum) # -> {'Fiction': 1, 'Non-Fiction': 2}
pandas
:split
groupby
on Type
, then aggregate the count
and sum
functions.
reset_index
and rename
to get the exact form desired. import pandas as pd
# read the file in
df = pd.read_csv('file.csv')
Title_1 Title_2 Type
He heard it from space A quick story about sounds from space Fiction
The end of all time A sad poem about the end of time Non-Fiction
The perfect beginning A story about friendship Non-Fiction
# count the words in Title_1 & Title_2
df['num_words'] = df[['Title_1', 'Title_2']].apply(lambda x: len(f'{x[0]} {x[1]}'.split()), axis=1)
Title_1 Title_2 Type num_words
He heard it from space A quick story about sounds from space Fiction 12
The end of all time A sad poem about the end of time Non-Fiction 13
The perfect beginning A story about friendship Non-Fiction 7
# create your desired output
test = df[['Type', 'num_words']].groupby('Type')['num_words'].agg(['count', 'sum']).reset_index().rename(columns={'count': 'Count', 'sum': 'Num-words'})
Type Count Num-words
Fiction 1 12
Non-Fiction 2 20
dict
:test.to_dict('list')
>>> {'Type': ['Fiction', 'Non-Fiction'], 'Count': [1, 2], 'Num-words': [12, 20]}
This is what I ended up using:
fhand = csv.reader(fopen)
next(fhand)
category_sum = dict()
word_sum = dict()
for row in fhand:
num_words = len(row[0].split(" ")) + len(row[1].split(" "))
if row[2] in category_sum.keys():
category_sum[row[2]]+=1
word_sum[row[2]]+=num_words
else:
category_sum[row[2]]=1
word_sum[row[2]]=num_words
combined = {key:[category_sum[key],word_sum[key]] for key in category_sum}
#print(combined)
print("Category | # Titles | # of Words\n---------------------------------")
for key in combined:
print("{} | {} | {} ".format(key,combined[key][0],combined[key][1]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.