CSV to Multi-Value Dictionary?

Question

                Title_1                                Title_2         Type
 He heard it from space  A quick story about sounds from space      Fiction
    The end of all time       A sad poem about the end of time  Non-Fiction
  The perfect beginning               A story about friendship  Non-Fiction

I am trying to count all the Fiction, Non-Fiction Types and count the number of words in Title_1 and Title_2 for the corresponding Types.

My desired output would be:

Type         Count  Num-Words  
Non-Fiction   2       20
Fiction       1       12

This is what I have so far:

fopen =  open(file_name, 'r')
fhand = csv.reader(fopen)
next(fhand)
category_sum = dict()
for row in fhand:
    col_0=len(row[0].split())
    col_1=len(row[1].split())
    print( col_1 + col_1)
    if row[2] in category_sum.keys():
        category_sum[row[2]]+=1
    else:
        category_sum[row[2]]=1

I can get the total for the types in a nice dictionary, but I can't seem to figure out how to assign the word count to the appropriate type as a value in the dictionary.

Any ideas?

Answer 1

You can save dictionary as the value where one of the keys would be Count and the other Num-Words . So you dictionary value assignment may look like:

# num_of_words = 
if row[2] in category_sum.keys():
    category_sum[row[2]]['Count']+=1
    category_sum[row[2]]['Num-Words']+=num_of_words
else:
    category_sum[row[2]]={}

Answer 2

You could do it like this:

import csv

file_name = 'book_titles.csv'

with open(file_name, 'r', newline='') as fopen:
    reader = csv.reader(fopen)
    next(reader)  # Skip header.
    category_sum = {}
    for row in reader:
        category_sum[row[2]] = category_sum.get(row[2], 0) + 1

print(category_sum)  # -> {'Fiction': 1, 'Non-Fiction': 2}

Answer 3

Use `pandas` :

Create the dataframe
Combine both titles, split on the spaces and count the words in the list created by split
groupby on Type , then aggregate the count and sum functions.
- reset_index and rename to get the exact form desired.

import pandas as pd

# read the file in
df = pd.read_csv('file.csv')

                Title_1                                Title_2         Type
 He heard it from space  A quick story about sounds from space      Fiction
    The end of all time       A sad poem about the end of time  Non-Fiction
  The perfect beginning               A story about friendship  Non-Fiction

# count the words in Title_1 & Title_2
df['num_words'] = df[['Title_1', 'Title_2']].apply(lambda x: len(f'{x[0]} {x[1]}'.split()), axis=1)

                Title_1                                Title_2         Type  num_words
 He heard it from space  A quick story about sounds from space      Fiction         12
    The end of all time       A sad poem about the end of time  Non-Fiction         13
  The perfect beginning               A story about friendship  Non-Fiction          7

# create your desired output
test = df[['Type', 'num_words']].groupby('Type')['num_words'].agg(['count', 'sum']).reset_index().rename(columns={'count': 'Count', 'sum': 'Num-words'})

        Type  Count  Num-words
     Fiction      1         12
 Non-Fiction      2         20

That's 3 lines of code to get the desired output
With the data in a dataframe, you can more easily perform other types of text analysis, if desired (eg Text analysis: finding the most common word in a column using python )

Getting output in a `dict` :

test.to_dict('list')

>>> {'Type': ['Fiction', 'Non-Fiction'], 'Count': [1, 2], 'Num-words': [12, 20]}

Answer 4

This is what I ended up using:

fhand = csv.reader(fopen)
next(fhand)
category_sum = dict()
word_sum = dict()

for row in fhand:
    num_words = len(row[0].split(" ")) + len(row[1].split(" "))
    if row[2] in category_sum.keys():
        category_sum[row[2]]+=1
        word_sum[row[2]]+=num_words
    else:
        category_sum[row[2]]=1
        word_sum[row[2]]=num_words



combined = {key:[category_sum[key],word_sum[key]] for key in category_sum}   
#print(combined)
print("Category | # Titles | # of Words\n---------------------------------")
for key in combined:
    print("{}   |   {}  |   {}  ".format(key,combined[key][0],combined[key][1]))

CSV to Multi-Value Dictionary?

Question

4 answers

solution1
0 2019-10-12 21:26:43

solution2
0 2019-10-12 21:48:35

solution3
0 2019-10-12 21:54:40

Use `pandas` :

Getting output in a `dict` :

solution4
0 2019-10-19 20:10:24

CSV to Multi-Value Dictionary?

Question

4 answers

solution1 0 2019-10-12 21:26:43

solution2 0 2019-10-12 21:48:35

solution3 0 2019-10-12 21:54:40

Use pandas :

Getting output in a dict :

solution4 0 2019-10-19 20:10:24

solution1
0 2019-10-12 21:26:43

solution2
0 2019-10-12 21:48:35

solution3
0 2019-10-12 21:54:40

Use `pandas` :

Getting output in a `dict` :

solution4
0 2019-10-19 20:10:24