Title seems confusing, but let's say I'm working with the following CSV file ('names.csv').
name1,name2,name3
Bob,Jane,Joe
Megan,Tom,Jane
Jane,Joe,Rob
My question is, how would I go about making code that returns the string that occurs at least 3 times. So the output should be 'Jane', because that occurs at least 3 times. Really confused here.. perhaps some sample code would help me better understand?
So far I have:
import csv
reader = csv.DictReader(open("names.csv"))
for row in reader:
names = [row['name1'], row['name2'], row['name3']]
print names
This returns:
['Bob', 'Jane', 'Joe']
['Megan', 'Tom', 'Jane']
['Jane', 'Joe', 'Rob']
Where do I go from here? Or am I going about this wrong? I'm really new to Python (well, programming altogether), so I have close to no clue what I'm doing..
Cheers
I'd do it like this:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> rows = [['Bob', 'Jane', 'Joe'],
... ['Megan', 'Tom', 'Jane'],
... ['Jane', 'Joe', 'Rob']]
...
>>> for row in rows:
... for name in row:
... d[name] += 1
...
>>> filter(lambda x: x[1] >= 3, d.iteritems())
[('Jane', 3)]
It uses dict with default value of 0 to count how many times each name happens in the file, and then it filters the dict with according condition (count >= 3).
Putting it altogether (and showing proper csv.reader usage):
import csv
import collections
d = collections.defaultdict(int)
with open("names.csv", "rb") as f: # Python 3.x: use newline="" instead of "rb"
reader = csv.reader(f):
reader.next() # ignore useless heading row
for row in reader:
for name in row:
name = name.strip()
if name:
d[name] += 1
morethan3 = [(name, count) for name, count in d.iteritems() if count >= 3]
morethan3.sort(key=lambda x: x[1], reverse=True)
for name, count in morethan3:
print name, count
Update in response to comment:
You need to read through the whole CSV file whether you use the DictReader approach or not. If you want to eg ignore the 'name2' column ( not row ), then ignore it. You don't need to save all the data as your use of the variable name "rows" suggests. Here is code for a more general approach that doesn't rely on the column headings being in a particular order and allows selection/rejection of particular columns.
reader = csv.DictReader(f):
required_columns = ['name1', 'name3'] #### adjust this line as needed ####
for row in reader:
for col in required_columns:
name = row[col].strip()
if name:
d[name] += 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.