I'm trying to complete an online course and one question is to count the number of occurrences of the word "fantastic" in a large file. When an occurrence is found the first element of that line needs to be stored (the id) to build up a list of lines(ids) containing the word. So far I have the below which is reading the lines correctly but I can't figure out how to check if "fantastic" is somewhere in that line in upper/lower case. I've tried using row.count('fantastic')
which didn't work as I'm not sure how csv reader stores the lines, if I can get them counting I can just add the id to array and print it at the end when once or more occurrences are found per line.
#!/usr/bin/python
import sys
import csv
def main():
f = open("test_file.txt", 'rt')
filereader = csv.reader(f, delimiter=' ', quotechar='"')
for row in filereader:
print row[0]
print row.count('fantastic')
if __name__ == "__main__":
main()
Below is a very small sample set where I've thrown in a few fantastic's.
"6361" "When will unit 2 be online? fantastic" "cs101 unit2" "100003292" "<p>When will unit 2 be online?</p>" "question" "\N" "\N" "2012-02-26 15:47:12.522262+00" "0" "(closed)" "51919" "100003292" "2012-03-03 10:12:27.41521+00" "21196" "\N" "\N" "186" "t"
"7185" "Hungarian group" "cs101 hungarian nationalities" "100003268" "<p>Hi there! This is FANTASTIC</p>
<p>Any Hungarians doing the course? We could form a group!<br>
;)</p>" "question" "\N" "\N" "2012-02-27 15:09:11.184434+00" "0" "" "\N" "100003268" "2012-02-27 15:09:11.184434+00" "9322" "\N" "\N" "106" "f"
"26454" "Course Application." "cs101 application." "100003192" "<p>Please tell about the Course Application. How to use the Course for higher education and jobs?</p>" "question" "\N" "\N" "2012-03-08 08:34:06.704674+00" "-1" "" "\N" "100003192" "2012-03-08 08:34:06.704674+00" "34477" "\N" "\N" "73" "f"
I would expect the output to be 6361, 7185
You are close.
First, make sure that those are not tabs rather than spaces.
Second, if you use csv, the result is a list for each row. You need to check each string in the list. You can either use any
or join
to make a single string.
Third, you need to use lower()
since 'FANTASTIC' is not the same as 'fantastic'
import csv
def main():
f = open("test_file.txt", 'rt')
filereader = csv.reader(f, delimiter='\t')
for row in filereader:
if any('fantastic' in e.lower() for e in row[1:]):
print row[0]
To gather all the rows into a list, you might do something like:
def main():
result=[]
with open("/tmp/so.csv", 'rt') as f:
filereader = csv.reader(f, delimiter='\t', quotechar='"')
for row in filereader:
if any('fantastic' in e.lower() for e in row[1:]):
result.append(row[0])
print result
The default quote character is already "
so you don't need to specify that, but if you've got a tab delimited file, passing in '\\t'
as the delimiter will correctly interpret the columns.
What you can do is build a generator to filter rows based on whether the substring 'fantastic'
appears in any columns after the ID, then use a list comprehension to extract the IDs, eg:
with open('test_file.txt') as fin:
csvin = csv.reader(fin, delimiter='\t')
has_fantastic = (row for row in csvin if any('fantastic' in col.lower() for col in row[1:]))
ids = [row[0] for row in has_fantastic]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.