I am trying to create a dataframe from the below sample csv I've been given but I am getting Error tokenizing data. C error: EOF inside string starting at line 0. I haven't had very much practise with treating bad lines but would really like to learn the best way to handle something like this. I have attempted many different options in read_csv such as error_bad_line=False but that has not worked either.
CParserError: Error tokenizing data. C error: EOF inside string starting at line 0
I am guessing that the line terminators of ," are causing the issue and I am guessing that the best way is to loop through each line and process so I came up with the below generator with help from a different and was hoping I am close. Would really like to learn how to use a generator and yield for this also.
Sample data:
"USNC3255","27","US","NC","LANDS END","72305006","KNJM","KNCA","KNKT","T72305006","","","NCC031","NCZ095","","545","28594","America/New_York","34.65266","-77.07661","7","RDU","893727","
"USNC3256","27","US","NC","LANDSDOWN","72314058","KEHO","KAKH","KIPJ","T72314058","","","NCC045","NCZ068","sc007","517","28150","America/New_York","35.29374","-81.46537","797","CLT","317845","
I have crafted the below which removes last two characters but not sure hot to produce a dataframe from the lines:
def big_table_generator(filename):
with open(filename, 'rt') as f:
for line in f:
yield line[:-3]
gen = big_table_generator('../data/test_sun_file.csv')
pd.DataFrame(gen)
I had a similar error. Fixed it by using the option quoting=csv.QUOTE_NONE in read_csv.
For example:
df = pd.read_csv(csvfile, header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')
Some info about why in the second comment here: https://github.com/pydata/pandas/issues/5500
Here is the solution I came up with but I really wanted to avoid using list and append and take advantage of a generator instead but not yet comfortable enough working with generators.
def parse_file(filename):
newline = []
with open(filename, 'rb') as f:
reader = csv.reader(f, quoting=csv.QUOTE_NONE)
for row in reader:
newline.append([s.strip('"') for s in row[:-1]])
df = pd.DataFrame(newline)
df = df.applymap(lambda x: nan if len(x) == 0 else x).astype(object)
return df
df = parse_file(filename)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.