简体   繁体   中英

Importing CSV file with Multiple Delimiters in Python

I am trying to import a data file into a notebook using Python.

Here is the actual data: https://drive.google.com/file/d/1Fr5urzbuGx7QIg_2ueMXAAlDM9xU5e4P/view?usp=sharing

This is kind of how the csv file is formatted:

"AwardNumber","Title","NSFOrganization","Program(s)","StartDate","LastAmendmentDate","PrincipalInvestigator","State","Organization","AwardInstrument","ProgramManager","EndDate","AwardedAmountToDate","Co-PIName(s)","PIEmailAddress","OrganizationStreet","OrganizationCity","OrganizationState","OrganizationZip","OrganizationPhone","NSFDirectorate","ProgramElementCode(s)","ProgramReferenceCode(s)","ARRAAmount","Abstract"
"1624943","Testing the Impact of Race on Jury Evaluations of Informants","SES","Sociology, Social Psychology, LSS-Law And Social Sciences","08/15/2016","07/17/2017","Mona Lynch","CA","University of California-Irvine","Standard Grant","Reggie Sheehan","06/30/2019","$353,747.00","","lynchm@uci.edu","141 Innovation Drive, Ste 250","Irvine","CA","926173213","9498247295","SBE","1331, 1332, 1372","9251","$0.00","An important body of legal scholarship has emerged about the justice risks associated with the use of informants, who provide information to law enforcement officials about criminal activity usually in exchange for leniency consideration or dismissal on a pending criminal charge. Despite the increasing concern, there has been very little empirical research on the use of informants as witnesses."
"1917573","States and Security: Border Orientation in the Modern World","SES","Political Science","08/15/2019","08/26/2019","Beth Simmons","PA","University of Pennsylvania","Standard Grant","Brian Humes","07/31/2021","$476,137.00","Michael Kenwick","simmons3@law.upenn.edu","Research Services","Philadelphia","PA","191046205","2158987293","SBE","1371","","$0.00","Border security is one of the most significant policy issues of our time. How do states benefit from globalization, while at the same time protecting a national space from unwanted influences, people, goods and activities?"
"1931871","CPS: Medium: A Secure, Trustworthy, and Reliable Air Quality Monitoring System for Smart and Connected Communities","SES","CPS-Cyber-Physical Systems","10/01/2019","10/24/2019","Haofei Yu","FL","University of Central Florida","Standard Grant","Sara Kiesler","09/30/2022","$1,198,111.00","Xinwen Fu, Deliang Fan, Haofei Yu, Kelly Stevens, Thomas Bryer","Haofei.Yu@ucf.edu","4000 CNTRL FLORIDA BLVD","Orlando","FL","328168005","4078230387","SBE","7918","7924, 9150","$0.00","A critical application of smart technologies is a smart, connected, and secured environmental monitoring network that can help administrators and researchers find better ways to incorporate evidence and data into public decision-making related to the environment."
"1922424","Standard Research: Consensus, Democracy, and the Public Understanding of Science","SES","STS-Sci, Tech & Society","09/01/2019","09/07/2019","James Weatherall","CA","University of California-Irvine","Continuing grant","Frederick Kronz","08/31/2022","$431,892.00","Cailin O'Connor","weatherj@uci.edu","141 Innovation Drive, Ste 250","Irvine","CA","926173213","9498247295","SBE","7603","1353","$0.00","This award supports a research project that studies how changing social networks influence public belief about science; it will focus specifically on how false beliefs can persist and spread even in evidence-rich environments, and how these beliefs in turn feed back into collective decision-making through democratic institutions."

The problem that I'm running into is instead of the values being separated only by columns, they're also enclosed in quotes, which is necessary because one of the columns contains a large amount of string text.

Here is how I would normally import it, but I am getting an error.

import pandas as pd
import numpy as np

award = pd.read_csv('ses_awards.csv')
award.head()

Thanks in advance for the help!

You need to use quotechar='"' argument with the pd.read_csv() function like this:

import pandas as pd
import numpy as np

award = pd.read_csv('ses_awards.csv', quotechar='"')
award.head()

Pandas documentation about read_csv() :

quotechar : str (length 1), optional
The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.

I tried the file you provided, and it was actually giving me an encoding error.

Try the following encoding:

pd.read_csv('ses_awards.csv', encoding = 'ISO-8859-1')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM