无法使用 Pandas 将文本格式转换为正确的数据框

Question

我正在从 URL = 'https://www.census.gov/construction/bps/txt/tb2u201901.txt' 读取文本源

在这里我使用 Pandas 将其转换为 Dataframe

df = pd.read_csv(URL, sep = '\t')

导出 df 后，我看到所有列都合并为单列，尽管将分隔符指定为 '\\t'。 如何解决这个问题。

Answer 1

由于您的文件不是 CSV 文件，您应该使用read_fwf()的函数read_fwf()因为您的列具有固定宽度。 您还需要删除不属于数据的前 12 行，并且需要使用dropna()删除空行。

df = pd.read_fwf(URL, skiprows=12)
df.dropna(inplace=True)
df.head()

United States   94439   58086   1600    1457    33296   1263
1   Northeast   9099.0  3330.0  272.0   242.0   5255.0  242.0
2   New England     1932.0  1079.0  90.0    72.0    691.0   46.0
3   Connecticut     278.0   202.0   8.0     3.0     65.0    8.0
4   Maine   357.0   222.0   6.0     0.0     129.0   5.0
5   Massachusetts   819.0   429.0   38.0    54.0    298.0   23.0

Answer 2

Your output is coming correct . If you open the URL , you will see that there sentences written which are not tab separated so its not able to present in correct way.
From line number 9 the results are correct

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/2K61J.png

无法使用 Pandas 将文本格式转换为正确的数据框

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-24 07:51:35

解决方案2
0 2020-08-24 07:45:29

无法使用 Pandas 将文本格式转换为正确的数据框

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-24 07:51:35

解决方案2 0 2020-08-24 07:45:29

解决方案1
1 已采纳 2020-08-24 07:51:35

解决方案2
0 2020-08-24 07:45:29