[英]How to read a data frame in txt.file that doesn't have separator or fixed width with pandas
[英]Read in txt file with fixed width columns
我正在尝试从以下网站打开 dat.txt 文件: http ://jse.amstat.org/datasets/04cars.dat.txt
而且我不确定使用哪个分隔符将它读入 python,因为它用空格分隔。
我尝试了pd.read_csv('http://jse.amstat.org/datasets/04cars.dat.txt', delimiter = 'sp')
以及其他一些东西,但似乎没有任何效果,以及:
np.genfromtxt("http://jse.amstat.org/datasets/04cars.dat.txt", delimiter= 'sp')
请注意,零和一分别代表一个单独的列。
使用read_fwf
而不是read_csv
。
[
read_fwf
读取] 固定宽度格式化行的表格到 DataFrame 中。
https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html
import pandas as pd
colspecs = (
(0, 44),
(46, 47),
(48, 49),
(50, 51),
(52, 53),
(54, 55),
(56, 57),
(58, 59),
(60, 66),
(67, 73),
(74, 77),
(78, 80),
(81, 84),
(85, 87),
(88, 90),
(91, 95),
(96, 99),
(100, 103),
(104, 106),
)
data_url = "http://jse.amstat.org/datasets/04cars.dat.txt"
df = pd.read_fwf(data_url, colspecs=colspecs)
df.columns = (
"Vehicle Name",
"Is Sports Car",
"Is SUV",
"Is Wagon",
"Is Minivan",
"Is Pickup",
"Is All-Wheel Drive",
"Is Rear-Wheel Drive",
"Suggested Retail Price",
"Dealer Cost",
"Engine Size (litres)",
"Number of Cylinders",
"Horsepower",
"City Miles Per Gallon",
"Highway Miles Per Gallon",
"Weight (pounds)",
"Wheel Base (inches)",
"Lenght (inches)",
"Width (inches)",
)
print(df)
的输出将是:
Vehicle Name ... Width (inches)
0 Chevrolet Aveo LS 4dr hatch ... 66
1 Chevrolet Cavalier 2dr ... 69
2 Chevrolet Cavalier 4dr ... 68
3 Chevrolet Cavalier LS 2dr ... 69
4 Dodge Neon SE 4dr ... 67
.. ... ... ...
422 Nissan Titan King Cab XE ... *
423 Subaru Baja ... *
424 Toyota Tacoma ... *
425 Toyota Tundra Regular Cab V6 ... *
426 Toyota Tundra Access Cab V6 SR5 ... *
[427 rows x 19 columns]
从此处检索的列名称和规范:
注意:不要忘记指定每列的开始和结束位置。 在不使用colspecs
, pandas
会根据导致数据错误的第一行做出假设。 下面是生成的csv
文件(带规格和不带规格)之间统一差异的摘录:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.