I've downloaded dataset "Age at 1st marriage (women)" from http://www.gapminder.org/data in Excel/CSV format. The dataset has the first row with header and the first column contains names of countries.
To read these data I am using the code below.
import numpy as np
source=open("D:\FirstMarriage.csv")
data = np.genfromtxt(source, dtype=None, delimiter=",", skip_header=1)
print data
After executing this code (in Spyder IDE) I receive this error:
ValueError: Some errors were detected !
Line #37 (got 118 columns instead of 117)
Line #38 (got 118 columns instead of 117)
Line #72 (got 118 columns instead of 117)
Line #87 (got 118 columns instead of 117)
Line #97 (got 118 columns instead of 117)
Line #98 (got 118 columns instead of 117)
Line #184 (got 118 columns instead of 117)
When I open the csv file with Notepad++ and I look for the indicated lines I find that these rows contain names of the countries with coma in their names. Moreover, these names are taken into quotation marks as the only ones probably to indicate that this is a full name. However, it doesn't help me. Please see the example below (I am showing only the first column):
China
Colombia
"Congo, Dem. Rep."
"Congo, Rep."
Costa Rica
Is there any easy way to clean this data and treat the name in quotation marks as a single string?
I use Python 2.7 (Anaconda) on Windows 10.
Thanks ahead!
The best way, in my opinion, to read a csv or any other character delimited file is to use the DataFrame
class from Pandas. You won't have to deal with the presence of commas since DataFrame
s follow all commons CSV specs.
import pandas as pd
data = pd.read_csv(source)
numpy is quote unaware.
There are 2 solutions to this.
Use pandas library
import pandas pandas.read_csv(filepath_or_buffer, quotechar='"').as_matrix()
It can be done using 2 csv files. First one you would have to create to relieve your data off the commas and add a separate delimiter say ;
and eliminating those double quotes present. For more understanding visit: https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html use the deletechars
parameter. Then in the generated csv file use it to as an input to a numpy array just use delimiter as ;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.