简体   繁体   中英

Numpy genfromtxt: python 2 vs 3 differences in reading line breaks

I am reading a CSV file using Numpy's genfromtxt. Everything works as expected in Python 2.7 (Numpy 1.11.3, anaconda distribution), but it fails completely in Python 3.4.3 (Numpy 1.12.0, installed through Ubuntu package manager).

The expected result in Python 2.7 (all of the data was read correctly):

>>> a = np.genfromtxt('data1.csv', delimiter=',', skip_header=1)
>>> a.shape
(1460, 3)

But in Python 3.4, the operation returns nothing:

>>> np.genfromtxt('data1.csv', delimiter=',', skip_header=1)
__main__:1: UserWarning: genfromtxt: Empty input file: "data1.csv"
array([], dtype=float64)
>>> a.shape
(0,)

If I don't skip the header, then I (some?) data from the file as a single array, and most of the values are nan:

>>> a.shape
(2923,)
>>> a
array([     nan,      nan,      nan, ...,      nan,    1256.,  147500.])

The first few lines of the CSV file are...

Id,GrLivArea,SalePrice
1,1710,208500
2,1262,181500
3,1786,223500
4,1717,140000
5,2198,250000
6,1362,143000

I don't see any other questions regarding this on this site or Google... am I missing something? The commands are identical. I know the numpy versions are a bit different, but this is as close as I can (easily) get between the two Python distributions.

I am on Linux (Ubuntu) now, but I have also recreated the problem on Windows.

Here was the problem: for some reason my CSV file contained only carriage returns (\\r) at the line ends: there were no line feeds (\\n) [I think Excel can be thanked for this]. I replaced the carriage returns with line feeds and it worked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM