How to read csv file which has column delimiter as well record delimiter

Question

My CSV file has 3 columns: Name,Age and Sex and sample data is:

AlexÇ39ÇM
#Ç#SheebaÇ35ÇF
#Ç#RiyaÇ10ÇF

The column delimiter is 'Ç' and record delimiter is '#Ç#'. Note the first record don't have the record delimiter(#Ç#), but all other records have record delimiter(#Ç#). Could you please tell me how to read this file and store it in a dataframe?

Answer 1

Both csv and pandas module support reading csv-files directly. However, since you need to modify the file contents line by line before further processing, I suggest reading the file line by line, modify each line as desired and store the processed data in a list for further handling.

The necessary steps include:

open file
read file line by line
remove newline char (which is part of the line when using readlines()
replace record delimiter (since a record is equivalent to a line)
split lines at column delimiter

Since .split() returns a list of string elements we get an overall list of lists, where each 'sub-list' contains/represents the data of a line/record. Data formatted like this can be read by pandas.DataFrame.from_records() which comes in quite handy at this point:

import pandas as pd

with open('myData.csv') as file:
    # `.strip()` removes newline character from each line
    # `.replace('#;#', '')` removes '#;#' from each line
    # `.split(';')` splits at given string and returns a list with the string elements
    lines = [line.strip().replace('#;#', '').split(';') for line in file.readlines()]

df = pd.DataFrame.from_records(lines, columns=['Name', 'Age', 'Sex'])

print(df)

Remarks:

I changed Ç to ; which worked better for me due to encoding issues. However, the basic idea of the algorithm is still the same.
Reading data manually like this can become quite resource-intensive which might be a problem when handling larger files. There might be more elegant ways, which I am not aware of. When getting problems with performance, try to read the file in chunks or have a look for more effective implementations.

How to read csv file which has column delimiter as well record delimiter

Question

1 answers

solution1
0 ACCPTED 2018-10-11 12:46:27

How to read csv file which has column delimiter as well record delimiter

Question

1 answers

solution1 0 ACCPTED 2018-10-11 12:46:27

solution1
0 ACCPTED 2018-10-11 12:46:27