Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......'unnamed: n'

Question

Issue : I have gotten a csv file (wtih delimiter ~ ) that came from a third party, and about 4000 records, and has 150 columns with real column names such as FirstName~LastName~OrderID~City~..... . But when the file is loaded into a pandas dataframe df and when I use print(list(df.columns)) it displays the column names as follows (I've simplified it for brevity):

['ÿþA', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',,,,,'Unnamed: 49']

Question : What I may be doing wrong, and how can we fix the issue to simply display the real column names in df? I'm using latest version of python . I see some relevant articles such as this one but they are all related to one column.

Remark : It's a UTF-16 LE BOM file. I discovered the issue when in my code, I referenced a column as df['OrdeID'] and I got well know KeyError that means you are refencing a column that does not exist.

Code :

import pandas as pd

df = pd.read_csv('/dbfs/FileStore/tables/MyDataFile.txt', sep='~', low_memory=False, quotechar='"', header='infer', encoding='cp1252')

print(df['OrdId'])

MyDataFile.txt sample :

FirstName~LastName~OrderID~City~.....
Kim~Doe~1234~New York~...............
Bob~Mason~456~Seattle~...............
..................

Answer 1

Are you sure you have the right encoding?

I see your data file starts with ÿþ when read in a cp1252 encoding. That looks like a UTF16 byte order mark (BOM.) Wikipedia has a table of these , and if you look at that table, you'll see it's a match with UTF16-LE (little endian.)

Once you figure out the right encoding, you can tell Pandas what encoding to use by calling pd.read_csv(..., encoding='...') . To figure out what to put in the encoding field, you can consult this table . If you want UTF16-LE, that's 'utf_16_le' .

More information:

Pandas docs on read_csv

What is this "ÿþA"? This is the same question, but about R instead of Python.

Answer 2

Hey you can not use it directly and wanna use another way out by renaming it as per my understanding its non exixtent

try using

df.rename(columns={'Unnamed: 0':'new name0','Unnamed: 1':'new name1'}, inplace=True)

Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......'unnamed: n'

Question

2 answers

solution1
1 ACCPTED 2022-06-18 00:12:28

solution2
0 2022-06-17 20:45:33

Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......'unnamed: n'

Question

2 answers

solution1 1 ACCPTED 2022-06-18 00:12:28

solution2 0 2022-06-17 20:45:33

solution1
1 ACCPTED 2022-06-18 00:12:28

solution2
0 2022-06-17 20:45:33