简体   繁体   中英

pd.read_csv creates a multi-index dataframe if I have blank header entries

I have a csv where not all column headers are specified.

temp.csv reads,

a, b
1, 2, 3, 4
5, 6, 7, 8

When I try to read this with pandas, i get a multi-index dataframe.

pd.read_csv('temp.csv')

produces the output,

        a   b
1   2   3   4
5   6   7   8

What I want is for the [1, 5] column header to be 'a', and the [2, 6] column to be 'b'. Explicitly setting index_col=None does not fix the problem. Any ideas?

Edit: Thanks ALollz. I modified your answer slightly so I only read the file once. (I'll be reading a lot of files.)

df = pd.read_csv('temp.csv')
names = df.columns.tolist()
df.reset_index(inplace=True)
df.columns = names + [i for i in range(df.shape[1] - len(names))]

You can ignore the broken header with a combination of header=0 and the names you want to specify:

pd.read_csv('temp.csv', header=0, names=['a', 'b', 'col1', 'col2'])
#   a  b  col1  col2
#0  1  2     3     4
#1  5  6     7     8

If you don't want to manually specify things you can read the first row to use the headers and then figure out how many other names you need to supply.

names = pd.read_csv('temp.csv', nrows=1)
names = names.columns.tolist() + [f'col{i}' for i in range(1, df.shape[1] - len(names))]

df = pd.read_csv('temp.csv', header=0, names=names)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM