简体   繁体   中英

pandas read csv with extra commas and quotations in column

I'm reading a basic csv file where the columns are separated by commas. However, the body column is a string which may contain commas and quotations. For example, there are some cells like "Bahamas\\", The" and "Germany, West"

I have tried text = pd.read_table("input.txt", encoding = 'utf-16', quotechar='"', sep = ',') , text = pd.read_table("input.txt", encoding = 'utf-16', quotechar='"', delimiter = ',') . But they both cannot work.

Is there a way to go around this problem?

Are you able to regenerate the csv? If yes, change the delimit character to a pipe, Ie | . If not, you may be forced to take the long route... because there is no way for any code to figure out which characters are delimiting/quoting and which are part of the value if you have both commas and quotes lurking inside the value.

A workaround could involve leveraging the column position where this problem occurs... Ie first you could isolate the columns to the left of the troubled column, isolate all columns to the right, then all characters remaining are your troubled column. Can you post a few example rows? It would be good to see a few rows that have this issue, and a few that work fine

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM