columnspecs 不匹配，因此使用 pd.read_fwf 和使用 colspecs 讀取值錯誤

Question

我正在使用pd.read_fwf讀取文本文件，如下所示：

import pandas as pd

specs_test =[(19, 20),(20, 21),(21, 23),(23,26)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]

test_l = pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test)

和 test.txt 如下：

讀取文件 test_l 后如下：

    Record_Type Resident_Status State_Occurrence_FIPS   County_Occurrence_FIPS
0   1   S   C0  59
1   1   S   C0  51
2   1   S   C0  19
3   1   S   C0  33
4   1   S   C0  7
5   1   S   C0  41
6   2   S   C0  79
7   1   S   C0  43
8   1   S   C0  45
9   2   S   C0  79

但是，根據我的 colspec 它應該具有以下內容（我剛剛按預期添加了第一行）：

1   1  SC  059

我在這里想念什么？ 非常感謝您的幫助！

Answer 1

我在將您的數據粘貼到測試文件並修復元組時得到了這個。

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]
pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test )

它在第 4 列刪除了前導零，因此您可能不得不使用 kwargs 來發送數據類型或在導入后修復該列

   Record_Type  Resident_Status State_Occurrence_FIPS  County_Occurrence_FIPS
0            1                1                    SC                      59
1            1                1                    SC                      51
2            1                1                    SC                      19
3            1                1                    SC                      33
4            1                1                    SC                       7
5            1                1                    SC                      41
6            2                2                    SC                      79
7            1                1                    SC                      43
8            1                1                    SC                      45
9            2                2                    SC                      79

Answer 2

首先，您被索引關閉。 嘗試：

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]

此外，對於數值，前導零將被忽略。 要保留它們，您可以通過添加以下內容轉換為字符串：

converters = {h:str for h in names_test}

最終代碼可以是：

import pandas as pd

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)] ## Here you where off by an index.

names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS", "County_Occurrence_FIPS"]

test_l = pd.read_fwf('test.txt', 
                 header=None, 
                 names = names_test, 
                 colspecs= specs_test, 
                 converters = {h:str for h in names_test}) ## If you want to keep the leading 
                                                           ## zeros you can convert to string.

結果：

Record_Type Resident_Status State_Occurrence_FIPS   County_Occurrence_FIPS
0   1   1   SC  059
1   1   1   SC  051
2   1   1   SC  019
3   1   1   SC  033
4   1   1   SC  007
5   1   1   SC  041
6   2   2   SC  079
7   1   1   SC  043
8   1   1   SC  045
9   2   2   SC  079

columnspecs 不匹配，因此使用 pd.read_fwf 和使用 colspecs 讀取值錯誤

問題描述

2 個解決方案

解決方案1
1 2021-05-17 00:44:32

解決方案2
1 2021-05-17 06:18:59

columnspecs 不匹配，因此使用 pd.read_fwf 和使用 colspecs 讀取值錯誤

問題描述

2 個解決方案

解決方案1 1 2021-05-17 00:44:32

解決方案2 1 2021-05-17 06:18:59

解決方案1
1 2021-05-17 00:44:32

解決方案2
1 2021-05-17 06:18:59