I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.
Here's an example of what the data looks like:
# Header text
# Header text
# id col1 col2 col3 col4
0 0.44:66 0 1600 45.6e-3
1 0.25:7f 0 1600 52.1e-3
2 0.31:5e 0 1600 33.7e-3
...
2500 0.42.6f 0 1400 42.1e-3
# END
# Footer text
I am reading it in as follows:
import pandas as pd
with open(filename, 'rt') as f:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
Then when I print(df.dtypes)
I get the following:
# id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
This is fine, except for the #
in the name of the first column. So I tried specifying the names:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"])
but then I get print(df.dtypes)
id object
col1 object
col2 object
col3 object
col4 object
dtype: object
So I tried specifying both names
and dtypes
:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"],
dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})
but this gives an error:
ValueError: Unable to convert column id to type <class 'int'>
What's wrong? How can I load the table with the column names
I want and the appropriate dtypes
?
I have found a workaround solution but I am open to better solutions if they are out there.
I loaded the table without specifying the names
or dtypes
and then renamed the problematic column name as:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
df.rename(columns={'# id':'id'}, inplace=True)
Then I used print(df.dtypes)
to get the desired output:
id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
A few comments.
Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' '
in the call to read_table
or read_csv
.
Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)
But to answer your question:
If you specify the column names explicitly with names=[...]
and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row ( skiprows=3
), or else pandas will assume that line is part of the table data and thus set the data type to object
(ie strings) for all columns.
使用类型
df['id'] = df['id'].astype(int)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.