I'm following along with the examples in Wes McKinney's "Python for Data Analysis".
In Chapter 2, we are asked to count the number of times each time zone appears in the 'tz' position, where some entries do not have a 'tz'.
McKinney's count of "America/New_York" comes out to 1251 (there are 2 in the first 10/3440 lines, as you can see below), whereas mine comes out to 1. Trying to figure out why it shows '1'?
I am using Python 2.7, installed at McKinney's instruction in the text from Enthought (epd-7.3-1-win-x86_64.msi). Data comes from https://github.com/Canuckish/pydata-book/tree/master/ch02 . In case you can't tell from the title of the book I am new to Python, so please provide instructions on how to get any info I have not provided.
import json
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()
records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']
The last line here will show 'America/New_York', the analog for records[1] shows 'America/Denver'
#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]
This shows the first ten time zone entries, where 8-10 are blank...
#counting using a dict to store counts
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
counts = get_counts(time_zones)
counts['America/New_York']
this = 1, but should be 1251
len(time_zones)
this = 3440, as it should
'America/New_York'
timezone occurs 1251
times in the input:
import json
from collections import Counter
with open(path) as file:
c = Counter(json.loads(line).get('tz') for line in file)
print(c['America/New_York']) # -> 1251
It is not clear why the count is 1
for your code. Perhaps the code indentation is not correct:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else: #XXX wrong indentation
counts[x] = 1 # it is run after the loop if there is no `break`
return counts
See Why does python use 'else' after for and while loops?
The correct indentation should be:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1 # it is run every iteration if x not in counts
return counts
Check that you do not mix spaces and tabs for indentation, run your script using python -tt
to find out.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.