第2章“用于数据分析的Python”中的示例

Question

I'm following along with the examples in Wes McKinney's "Python for Data Analysis". 我将遵循Wes McKinney的“ Python for Data Analysis”中的示例。

In Chapter 2, we are asked to count the number of times each time zone appears in the 'tz' position, where some entries do not have a 'tz'. 在第2章中，我们要求计算每个时区出现在“ tz”位置的次数，其中某些条目没有“ tz”。

McKinney's count of "America/New_York" comes out to 1251 (there are 2 in the first 10/3440 lines, as you can see below), whereas mine comes out to 1. Trying to figure out why it shows '1'? 麦金尼（McKinney）的“ America / New_York”计数为1251（前10/3440行中为2，如下所示），而我的计数为1。试图弄清楚为什么它显示为“ 1”吗？

I am using Python 2.7, installed at McKinney's instruction in the text from Enthought (epd-7.3-1-win-x86_64.msi). 我使用的是Python 2.7，该代码已按照Enthought（epd-7.3-1-win-x86_64.msi）文本中的McKinney的说明进行安装。 Data comes from https://github.com/Canuckish/pydata-book/tree/master/ch02 . 数据来自https://github.com/Canuckish/pydata-book/tree/master/ch02 。 In case you can't tell from the title of the book I am new to Python, so please provide instructions on how to get any info I have not provided. 如果您无法从书名中得知我是Python的新手，请提供有关如何获取我未提供的任何信息的说明。

import json

path = 'usagov_bitly_data2012-03-16-1331923249.txt'

open(path).readline()

records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']

The last line here will show 'America/New_York', the analog for records[1] shows 'America/Denver' 此处的最后一行将显示“ America / New_York”，记录的类似物[1]显示“ America / Denver”

#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]

time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]

This shows the first ten time zone entries, where 8-10 are blank... 这显示了前十个时区条目，其中8-10是空白...

#counting using a dict to store counts
def get_counts(sequence):
    counts = {}
        for x in sequence:
        if x in counts:
            counts[x] += 1
        else:
            counts[x] = 1
        return counts

counts = get_counts(time_zones)
counts['America/New_York']

this = 1, but should be 1251 = 1，但应为1251

len(time_zones)

this = 3440, as it should 这= 3440，应该

Answer 1

'America/New_York' timezone occurs 1251 times in the input: 'America/New_York'时区在输入中出现1251次：

import json
from collections import Counter

with open(path) as file:
    c = Counter(json.loads(line).get('tz') for line in file)
print(c['America/New_York']) # -> 1251

It is not clear why the count is 1 for your code. 目前尚不清楚为什么您的代码的计数为1 。 Perhaps the code indentation is not correct: 也许代码缩进是不正确的：

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
    else: #XXX wrong indentation
        counts[x] = 1 # it is run after the loop if there is no `break` 
    return counts

See Why does python use 'else' after for and while loops? 请参阅为什么在for和while循环之后python为什么使用'else'？

The correct indentation should be: 正确的缩进应为：

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
        else: 
            counts[x] = 1 # it is run every iteration if x not in counts
    return counts

Check that you do not mix spaces and tabs for indentation, run your script using python -tt to find out. 检查您是否没有混合空格和制表符来缩进，请使用python -tt运行脚本以进行查找。

第2章“用于数据分析的Python”中的示例

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-05-23 02:53:06

第2章“用于数据分析的Python”中的示例

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-05-23 02:53:06

解决方案1
0 已采纳 2014-05-23 02:53:06