简体   繁体   中英

How to import .txt data into a pandas dataframe?

I am trying to import the data from the file at https://drive.google.com/file/d/1leOUk4Z5xp9tTiFLpxgk_7KBv3xwn5eW/view into a pandas dataframe. I have tried using

    data = pd.read_csv('data_engineering_assignment.txt',sep="|")

but I got an error saying "ParserError: Error tokenizing data. C error: Expected 9 fields in line 231, saw 10" I dont want to use 'error_bad_lines=False' and skip lines of data.

Kindly help.

You have a problem in your dataset, the problem is that sometimes, i find | in the description_text : for example, for this id 5d0c7c4c312ff75188d84954 you have | in of A|X design , so pandas considered the second part as a new column (that's why you have the message : Expected 9 fields, but saw 10 I hope this will helps you to understand the problem.

You can specify the columns names, stating that there are 10:

import pandas as pd

cols = ['_id','name','price','website_id','sku','url','brand','media','description_text','other']
dataframe = pd.read_csv('./data_engineering_assignment.txt', names=cols, sep='|' )
dataframe['description_text'] = dataframe['description_text'].map(str) + dataframe['other']
dataframe.to_csv('./data_engineering_assignment_v2.txt', index=False, sep=',')

You'll get a warning on memory usage due to pandas having to guess the column data type, but it's ok

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM