简体   繁体   中英

Read csv file in python, but the column names are shifted

I am trying to use the following code to read the data from a txt file:

import pandas as pd

headerLines=12

data = pd.read_csv('test.txt',skiprows=headerLines,sep='\t',names=['a','b','c','d','e','f','g','h','i'])

print(data.head())

However, the following is what I get which is not what I want. The column names are shifted rightwards, therefore there is one additional column with NaNs generated (what I want is that column name 'a' should be corresponding to the column starting with 2000000, and there should be a column of index to the left of the first column). Any expert could tell me the reason and how to fix this? Thanks a lot.

                 a           b         c         d         e         f  \
2000000   -65.949737  167.359438 -9.773884 -0.102801 -9.768339 -0.102985   
31990000  -44.882304  149.629367 -9.776339 -1.058768 -9.772569 -1.056513   
61980000  -43.898586 -155.579474 -9.777945 -1.976854 -9.775798 -1.969913   
91970000  -55.187924 -100.870064 -9.781525 -2.895683 -9.778132 -2.877063   
121960000 -46.330680  126.798745 -9.783116 -3.803569 -9.779577 -3.782513   
                   g           h   i  
2000000   -68.031965  -40.420658 NaN  
31990000  -58.193022   93.591063 NaN  
61980000  -53.468840  132.634058 NaN  
91970000  -53.542601  171.131622 NaN  
121960000 -53.124162 -142.028566 NaN 

I was able to reproduce the behaviour you described by separating the first column with spaces instead of tabs. You may want to check whether your input has a similar issue. This can be done easily with

print(data["a"])

If this prints two columns (which are in reality not two but one column with type "string"), then the problem is very likely caused by a wrong delimiter. Pandas interprets an input "1234 1234" as a text string, if the numbers are not separated by the given delimiter (tab in your case).

You can resove such problems by using the argument delim_whitespace=True instead of sep='\\t' . This will make pandas use any combination of whitespaces as delimiter. (See also the pandas docs .)

Edit

I realized now that the data after the line break start again with the values of the first column in your example. This indicates that the first column is somehow interpreted as the index. Therefore, I do not believe that my answer will help you. I keep it here just in case someone has the issue I described and reads your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM