简体   繁体   中英

to dataframe from a CSV file with multiple delimiters

I'm trying to create a pandas dataframe from a CSV that has multiple delimiters. The delimiter for the header(column names) of the CSV is a comma, the rest of the rows are TAB-delimited.

I've tried doing things like this:

df = pd.read_csv('csvfile.csv', names=['Code', 'Name'], header=None, skiprows=1, sep='\t')

It's not a big deal for me to skip the header row since I know what the column names will be any way, but the above isn't working for me. Is there a way I can parse the header row differently than the rest of the data, or is it possible for me to skip the header row and just delimit by TAB?

One way could be:

with open('csvfile.csv', 'r') as f:
    header = f.readline()
    content = f.readlines()

df = pd.DataFrame([ {i:j for i,j in zip(header.strip().split(","), 
     r.strip().split("\t"))}  for r in content ])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM