简体   繁体   中英

pd.read_csv: delimiter = '\t' and header=None not compatible

I have this line of code:

df = pd.read_csv('some_file.txt',engine ='python', 
                  delimiter = '\t', header=None, encoding="utf-16")

I'm using those txt files quiet often in my lab, one of our machines gives them as output. If I only use the delimiter I get a nice table, but with the first element as header for everything. If I only use header = None I get rid of the header, but have a bunch of \t everywhere. If I try to use both commands, I get this error:

ParserError: Expected 1 fields in line 3, saw 23

When removing enigne = 'python' I get a similar error. (also tried seperator and a bunch of other things)

Help would be very much appreciated!

Edit: As requested that's how the file looks like:

##BLOCKS= 1
Plate:  Plate1  1.3 PlateFormat Endpoint    Absorbance  Raw FALSE   1                       1   562     1   12  96  1   8       
    Temperature(¡C) 1   2   3   4   5   6   7   8   9   10  11  12      
    26.5    0.8368  0.5211  0.321   0.2707  0.2124  0.1768  0.1694  0.1635  0.1659  0.1029  0.1032  0.104       
        0.7142  0.4866  0.2968  0.252   0.2111  0.1737  0.1633  0.162   0.1599  0.1009  0.1007  0.1025      
        0.3499  0.2119  0.2799  0.2097  0.3114  0.3393  0.2544  0.2965  0.2392  0.3063  0.3093  0.2655      
        0.305   0.2068  0.2573  0.2008  0.287   0.2765  0.2373  0.2703  0.2357  0.2865  0.2926  0.263       
        0.2922  0.3456  0.1964  0.2667  0.3022  0.2596  0.2256  0.2387  0.2498  0.2936  0.2396  0.3411      
        0.3018  0.349   0.2069  0.272   0.2926  0.2444  0.2141  0.2348  0.2486  0.2678  0.2346  0.2944      
        0.2965  0.3505  0.2427  0.3322  0.1873  0.2286  0.3758  0.208   0.3023  0.3573  0.3141  0.2658      
        0.2956  0.3155  0.2514  0.2929  0.1985  0.2379  0.1898  0.2101  0.3211  0.3558  0.3121  0.2567      

~End
Original Filename: 20220725_Benedikt_DEF; Date Last Saved: 7/25/2022 2:31:30 PM

That's how it looks like when I read it without pandas:

['##BLOCKS= 1\n', 'Plate:\tPlate1\t1.3\tPlateFormat\tEndpoint\tAbsorbance\tRaw\tFALSE\t1\t\t\t\t\t\t1\t562 \t1\t12\t96\t1\t8\t\t\n', '\tTemperature(¡C)\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t\t\n', '\t26.5\t0.8368\t0.5211\t0.321\t0.2707\t0.2124\t0.1768\t0.1694\t0.1635\t0.1659\t0.1029\t0.1032\t0.104\t\t\n', '\t\t0.7142\t0.4866\t0.2968\t0.252\t0.2111\t0.1737\t0.1633\t0.162\t0.1599\t0.1009\t0.1007\t0.1025\t\t\n', '\t\t0.3499\t0.2119\t0.2799\t0.2097\t0.3114\t0.3393\t0.2544\t0.2965\t0.2392\t0.3063\t0.3093\t0.2655\t\t\n', '\t\t0.305\t0.2068\t0.2573\t0.2008\t0.287\t0.2765\t0.2373\t0.2703\t0.2357\t0.2865\t0.2926\t0.263\t\t\n', '\t\t0.2922\t0.3456\t0.1964\t0.2667\t0.3022\t0.2596\t0.2256\t0.2387\t0.2498\t0.2936\t0.2396\t0.3411\t\t\n', '\t\t0.3018\t0.349\t0.2069\t0.272\t0.2926\t0.2444\t0.2141\t0.2348\t0.2486\t0.2678\t0.2346\t0.2944\t\t\n', '\t\t0.2965\t0.3505\t0.2427\t0.3322\t0.1873\t0.2286\t0.3758\t0.208\t0.3023\t0.3573\t0.3141\t0.2658\t\t\n', '\t\t0.2956\t0.3155\t0.2514\t0.2929\t0.1985\t0.2379 \t0.1898\t0.2101\t0.3211\t0.3558\t0.3121\t0.2567\t\t\n', '\n', '~End\n', 'Original Filename: some_file; Date Last Saved: 7/25/2022 2:31:30 PM\n']

When I use just use the pd.read_csv(file, encoding =''utf-16')

I get this:

在此处输入图像描述

It' basically a file that is stating the wavelength absorbance from a sample plate with 8 rows and 12 columns (96 samples).

Assuming all the files have the same structure and you only want the data; skip the first four rows, don't use the last three rows, whitespace delimiter, no header, python engine.

>>> df = pd.read_csv(csv,skiprows=4,skipfooter=3,header=None,delim_whitespace=True,engine='python')
>>> df
       0       1       2       3       4       5       6       7       8       9       10      11
0  0.7142  0.4866  0.2968  0.2520  0.2111  0.1737  0.1633  0.1620  0.1599  0.1009  0.1007  0.1025
1  0.3499  0.2119  0.2799  0.2097  0.3114  0.3393  0.2544  0.2965  0.2392  0.3063  0.3093  0.2655
2  0.3050  0.2068  0.2573  0.2008  0.2870  0.2765  0.2373  0.2703  0.2357  0.2865  0.2926  0.2630
3  0.2922  0.3456  0.1964  0.2667  0.3022  0.2596  0.2256  0.2387  0.2498  0.2936  0.2396  0.3411
4  0.3018  0.3490  0.2069  0.2720  0.2926  0.2444  0.2141  0.2348  0.2486  0.2678  0.2346  0.2944
5  0.2965  0.3505  0.2427  0.3322  0.1873  0.2286  0.3758  0.2080  0.3023  0.3573  0.3141  0.2658
6  0.2956  0.3155  0.2514  0.2929  0.1985  0.2379  0.1898  0.2101  0.3211  0.3558  0.3121  0.2567
>>>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM