简体   繁体   中英

How to read a comma separated `csv` file into pandas dataframe if it contains a comma separated elements in a column?

I have the following test2.csv file:

id,name,city,country
1,David, Johnson,London,UK,
2,Mary, Gueta,NY,USA,
3,Matthias, Nile,Munich,Germany,

I want to read it into a pandas dataframe. Using this code

df = pd.read_csv('test2.csv')

I get the following df:

在此处输入图像描述

But I want to store the first name and last name together in the column name . id column should store the numbers. city and country should also store the appropriate values.

Thank you in advance!

import csv
file = open(`temp.csv')
csvreader = csv.reader(file)
header = []
header = next(csvreader)
new_rows = []
for row in csvreader:
    new_rows.append([row[0], row[1]+','+row[2], row[3], row[4]])
file.close()
df = pd.DataFrame(new_rows, columns =header )
df

step1: combined with 'id' & 'name' columns named a new column 'full_name'

df['full_name'] = df['id'].map(str) + '.' + df['name'].map(str)

step2: sorted the 'id' column

df.loc[:,'id'] = list(range(df.shape[0]))

step3: delete 'name' column

del df['name']

There is a useful pattern to distinguish between values, space appears after comma in names, so you can substitute delimiter "," with "|" that is more efficient to avoid this type of confusion

import csv
import pandas as pd
import re


#substitute comma not followed by space for vertical bar in the original file and save it as a new file
with open('test2.csv', 'r') as infile, open('test2_vb.csv', 'w') as outfile:
    for line in infile:
        line = re.sub(',(?! )', '|', line)
        outfile.write(line)

file = open('test2_vb.csv', 'r')
reader = csv.reader(file , delimiter='|')
#load the data into a dataframe
df = pd.DataFrame(reader)
print(df) 

this way using a regular expression the file can be corrected (though at the end of a line should not be separator)

You can change 1,David, Johnson,London,UK to 1,"David, Johnson",London,UK then load it using pd.DataFrame .

This happens because pandas think it is another value and to avoid that you need use quotations.

If you have line breaks in your value you will also need to use quotations.

you can find basic rules of csv from here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM