簡體   English   中英

第2章“用於數據分析的Python”中的示例

[英]Example from “Python for Data Analysis”, Chapter 2

我將遵循Wes McKinney的“ Python for Data Analysis”中的示例。

在第2章中,我們要求計算每個時區出現在“ tz”位置的次數,其中某些條目沒有“ tz”。

麥金尼(McKinney)的“ America / New_York”計數為1251(前10/3440行中為2,如下所示),而我的計數為1。試圖弄清楚為什么它顯示為“ 1”嗎?

我使用的是Python 2.7,該代碼已按照Enthought(epd-7.3-1-win-x86_64.msi)文本中的McKinney的說明進行安裝。 數據來自https://github.com/Canuckish/pydata-book/tree/master/ch02 如果您無法從書名中得知我是Python的新手,請提供有關如何獲取我未提供的任何信息的說明。

import json

path = 'usagov_bitly_data2012-03-16-1331923249.txt'

open(path).readline()

records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']

此處的最后一行將顯示“ America / New_York”,記錄的類似物[1]顯示“ America / Denver”

#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]

time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]

這顯示了前十個時區條目,其中8-10是空白...

#counting using a dict to store counts
def get_counts(sequence):
    counts = {}
        for x in sequence:
        if x in counts:
            counts[x] += 1
        else:
            counts[x] = 1
        return counts

counts = get_counts(time_zones)
counts['America/New_York']

= 1,但應為1251

len(time_zones)

這= 3440,應該

'America/New_York'時區在輸入中出現1251次:

import json
from collections import Counter

with open(path) as file:
    c = Counter(json.loads(line).get('tz') for line in file)
print(c['America/New_York']) # -> 1251

目前尚不清楚為什么您的代碼的計數為1 也許代碼縮進是不正確的:

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
    else: #XXX wrong indentation
        counts[x] = 1 # it is run after the loop if there is no `break` 
    return counts

請參閱為什么在for和while循環之后python為什么使用'else'?

正確的縮進應為:

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
        else: 
            counts[x] = 1 # it is run every iteration if x not in counts
    return counts

檢查您是否沒有混合空格和制表符來縮進,請使用python -tt運行腳本以進行查找。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM