Importing CSV into Python

Question

I have a CSV dataset that looks like this:

FirstAge,SecondAge,FirstCountry,SecondCountry,Income,NAME
41,41,USA,UK,113764,John
53,43,USA,USA,145963,Fred
47,37,USA,UK,42857,Dan
47,44,UK,USA,95352,Mark

I'm trying to load it into Python 3.6 with this code:

>>> from numpy import genfromtxt

>>> my_data = genfromtxt('first.csv', delimiter=',')
>>> print(train_data)

Output:

 [[             nan              nan              nan              nan
               nan              nan]
 [  4.10000000e+01   4.10000000e+01              nan              nan
    1.13764000e+05              nan]
 [  5.30000000e+01   4.30000000e+01              nan              nan
    1.45963000e+05              nan]
 ..., 
 [  2.10000000e+01   3.00000000e+01              nan              nan
    1.19929000e+05              nan]
 [  6.90000000e+01   6.40000000e+01              nan              nan
    1.52667000e+05              nan]
 [  2.00000000e+01   1.90000000e+01              nan              nan
    1.05077000e+05              nan]]

I've looked at the Numpy docs and I don't see anything about this.

Answer 1

Go with pandas , it will save you the trouble:

import pandas as pd

df = pd.read_csv('first.csv')
print(df)

Answer 2

Alternative from using pandas is to use csv library

import csv
import numpy as np
ls = list(csv.reader(open('first.csv', 'r')))
val_array = np.array(ls)[1::] # exclude first row (columns name)

Answer 3

You could use the dtype argument:

import numpy as np

output = np.genfromtxt("main.csv", delimiter=',', skip_header=1, dtype='f, f, |S6, |S6, f, |S6')

print(output)

Output:

[( 41.,  41., b'USA', b'UK',  113764., b'John')
 ( 53.,  43., b'USA', b'USA',  145963., b'Fred')
 ( 47.,  37., b'USA', b'UK',   42857., b'Dan')
 ( 47.,  44., b'UK', b'USA',   95352., b'Mark')]

Answer 4

With a few general paramters genfromtxt can read this file (in PY3 here):

In [100]: data = np.genfromtxt('stack43444219.txt', delimiter=',', names=True, dtype=None)
In [101]: data
Out[101]: 
array([(41, 41, b'USA', b'UK', 113764, b'John'),
       (53, 43, b'USA', b'USA', 145963, b'Fred'),
       (47, 37, b'USA', b'UK',  42857, b'Dan'),
       (47, 44, b'UK', b'USA',  95352, b'Mark')], 
      dtype=[('FirstAge', '<i4'), ('SecondAge', '<i4'), ('FirstCountry', 'S3'), ('SecondCountry', 'S3'), ('Income', '<i4'), ('NAME', 'S4')])

This is a structured array. 2 fields are integer, 2 are string (byte string by default), another integer, and string.

The default genfromtxt reads all lines as data. I uses names=True to get to use the first line a field names.

It also tries to read all strings a float (default dtype). The string columns then load as nan .

All of this is in the genfromtxt docs. Admittedly they are long, but they aren't hard to find.

Access fields by name, data['FirstName'] etc.

Using the csv reader gives a 2d array of strings:

In [102]: ls =list(csv.reader(open('stack43444219.txt','r')))
In [103]: ls
Out[103]: 
[['FirstAge', 'SecondAge', 'FirstCountry', 'SecondCountry', 'Income', 'NAME'],
 ['41', '41', 'USA', 'UK', '113764', 'John'],
 ['53', '43', 'USA', 'USA', '145963', 'Fred'],
 ['47', '37', 'USA', 'UK', '42857', 'Dan'],
 ['47', '44', 'UK', 'USA', '95352', 'Mark']]
In [104]: arr=np.array(ls)
In [105]: arr
Out[105]: 
array([['FirstAge', 'SecondAge', 'FirstCountry', 'SecondCountry', 'Income',
        'NAME'],
       ['41', '41', 'USA', 'UK', '113764', 'John'],
       ['53', '43', 'USA', 'USA', '145963', 'Fred'],
       ['47', '37', 'USA', 'UK', '42857', 'Dan'],
       ['47', '44', 'UK', 'USA', '95352', 'Mark']], 
      dtype='<U13')

Answer 5

I think the an issue that you could be running into is the data that you are trying to parse is not all numerics and this could potentially cause unexpected behavior.

One way to detect the types would be to try and identify the types before they are added to your array. For example:

for obj in my_data:
    if type(obj) == int:
        # process or add your data to numpy
    else:
        # cast or discard the data

Importing CSV into Python

Question

5 answers

solution1
2 2017-04-17 02:13:32

solution2
1 2017-04-17 02:30:25

solution3
1 2017-04-17 02:38:02

solution4
1 2017-04-17 02:56:38

solution5
-1 ACCPTED 2017-04-17 02:15:06

Importing CSV into Python

Question

5 answers

solution1 2 2017-04-17 02:13:32

solution2 1 2017-04-17 02:30:25

solution3 1 2017-04-17 02:38:02

solution4 1 2017-04-17 02:56:38

solution5 -1 ACCPTED 2017-04-17 02:15:06

solution1
2 2017-04-17 02:13:32

solution2
1 2017-04-17 02:30:25

solution3
1 2017-04-17 02:38:02

solution4
1 2017-04-17 02:56:38

solution5
-1 ACCPTED 2017-04-17 02:15:06