简体   繁体   中英

fail to load data in iPython, following example from “Python for data analysis” ch2

I am just starting to learn Python from Wes McKinnney's book, "Python for data analysis". I installed Python using Enthought Canopy 1.5.2-win-64 (as Enthought does not seem to distribute EPDFree anymore, which is recommended in the book).

I am blocking at Wes' first example, which prevents me from doing the rest of the chapter. The first example reads the first line of a text file available at https://github.com/pydata/pydata-book/tree/master/ch02 . Here is the code :

ipython --pylab
path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()

I just get a newline ouptut '\\n' whereas the output in the book is :

'{ "a": "Mozilla\\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\\/535.11
(KHTML, like Gecko) Chrome\\/17.0.963.78 Safari\\/535.11", "c": "US", "nk":1,
"tz": "America\\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l":
"orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r":
"http:\\/\\/www.facebook.com\\/l\\/7AQEFzjSi\\/1.usa.gov\\/wfLQtf", "u":
"http:\\/\\/www.ncbi.nlm.nih.gov\\/pubmed\\/22415991", "t":1331923247, "hc":
1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }\n'

Unfortunately, I do not know any JSON yet, but the file provided on Wes Mckinney's website does not seem to be exactly the same than the one on the book. Not sure if that could be the source of my problem.

I am new to Python, so any help would be greatly appreciated!

You need to use readlines to get a list of all the lines:

open(path).readlines() # readlines

readline() reads a single line.

You can also iterate over each line:

with open(path) as f: # with closes your files
    for line in f:
          print(line)

iterating over each line you should get:

{ "a": "Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/535.11 (KHTML, like Gecko) Chrome\/17.0.963.78 Safari\/535.11", "c": "US", "nk": 1, "tz": "America\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l": "orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r": "http:\/\/www.facebook.com\/l\/7AQEFzjSi\/1.usa.gov\/wfLQtf", "u": "http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22415991", "t": 1331923247, "hc": 1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }
{ "a": "GoogleMaps\/RochesterNY", "c": "US", "nk": 0, "tz": "America\/Denver", "gr": "UT", "g": "mwszkS", "h": "mwszkS", "l": "bitly", "hh": "j.mp", "r": "http:\/\/www.AwareMap.com\/", "u": "http:\/\/www.monroecounty.gov\/etc\/911\/rss.php", "t": 1331923249, "hc": 1308262393, "cy": "Provo", "ll": [ 40.218102, -111.613297 ] }
{ "a": "Mozilla\/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident\/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)", "c": "US", "nk": 1, "tz": "America\/New_York", "gr": "DC", "g": "xxr3Qb", "h": "xxr3Qb", "l": "bitly", "al": "en-US", "hh": "1.usa.gov", "r": "http:\/\/t.co\/03elZC4Q", "u": "http:\/\/boxer.senate.gov\/en\/press\/releases\/031612.cfm", "t": 1331923250, "hc": 1331919941, "cy": "Washington", "ll": [ 38.900700, -77.043098 ] }
  ...............

You must have added an empty line as the start of the file or you would have at least gotten the first line.

What is the actual content of that file on disk? Note that the path you pass to open(path).readline() is relative to whichever current directory you're in when you started ipython --pylab . However, you didn't get a "File not found" error so I assume a file exists in the right place.

How did you retrieve the file to use it locally? The book isn't specific. Did you go to the github page and download the zip package? Use Git to download the whole repository? Right-click in the browser to save the file? Did you ensure you actually downloaded the raw file and not the HTML page representing the file?

Edit: OP confirms the file they had was actually a right-clicked-saved file from the browser that was actually an HTML file, not the raw json file. The problem was fixed by downloading the whole package as a Zip from the repository's front page, and working from within that package.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM