繁体   English   中英

无法在 pandas python 中读取没有逗号分隔符的 txt 文件

[英]Not able to read txt file without comma separator in pandas python

代码

import pandas 
df = pandas.read_csv('biharpopulation.txt', delim_whitespace=True)
df.columns = ['SlNo','District','Total','Male','Female','Total','Male','Female','SC','ST','SC','ST'] 

数据

SlNo  District           Total      Male    Female     Total   Male    Female       SC         ST        SC       ST



 1   Patna                 729988    386991   342997      9236     5352     3884      15.5       0.2     38.6     68.7
 2   Nalanda               473786    248246   225540       970      524      446      20.2       0.0     29.4     29.8
 3   Bhojpur               343598    181372   162226      8337     4457     3880      15.3       0.4     39.1     46.7
 4  Buxar                  198014    104761    93253      8428     4573     3855      14.1       0.6     37.9     44.6
 5  Rohtas                 444333    233512   210821     25663    13479    12184      18.1       1.0     41.3     30.0
 6   Kaimur                286291    151031   135260     35662    18639    17023      22.2       2.8     40.5     38.6

 7   Gaya                 1029675    529230   500445      2945     1526     1419      29.6       0.1     26.3     49.1
 8   Jehanabad             174738     90485    84253      1019      530      489      18.9      0.07     32.6     32.4
 9   Arawal                 11479     57677    53802       294      179      115      18.8      0.04
10   Nawada                435975    223929   212046      2158     1123     1035      24.1       0.1     22.4     20.5
11   Aurangabad            472766    244761   228005      1640      865      775      23.5       0.1     35.7     49.7
                                                         Saran
12   Saran                 389933    199772   190161      6667     3384     3283        12       0.2     33.6     48.5
13   Siwan                 309013    153558   155455     13822     6856     6966      11.4       0.5     35.6     44.0
14   Gopalganj             267250    134796   132454      6157     2984     3173      12.4       0.3     32.1     37.8

15   Muzaffarpur           594577    308894   285683      3472     1789     1683      15.9       0.1     28.9     50.4
16   E. Champaran          514119    270968   243151      4812     2518     2294      13.0       0.1     20.6     34.3
17   W. Champaran          434714    228057   206657     44912    23135    21777      14.3       1.5     22.3     24.1
18   Sitamarhi             315646    166607   149039      1786      952      834      11.8       0.1     22.1     31.4
19  Sheohar                 74391     39405    34986        64       35       29      14.4       0.0     16.9     38.8
20   Vaishali              562123    292711   269412      3068     1595     1473      20.7       0.1     29.4     29.9

21   Darbhanga             511125    266236   244889       841      467      374      15.5       0.0     24.7     49.5
22   Madhubani             481922    248774   233148      1260      647      613      13.5       0.0     22.2     35.8
23   Samastipur            628838    325101   303737      3362     2724      638      18.5       0.1     25.1     22.0

24   Munger                150947     80031    70916     18060     9297     8763      13.3       1.6     42.6     37.3
25   Begusarai             341173    177897   163276      1505      823      682      14.5       0.1     31.4     78.6
26   Shekhapura            103732     54327    49405       211      115       96      19.7       0.0     25.2     45.6
27   Lakhisarai            126575     65781    60794      5636     2918     2718      15.8       0.7     26.8     12.9
28   Jamui                 242710    124538   118172     67357    34689    32668      17.4       4.8     24.5     26.7

您的问题是 CSV 是空格分隔的,但是您的某些地区名称中也有空格。 幸运的是,没有一个地区名称包含'\t'字符,所以我们可以解决这个问题:

df = pandas.read_csv('biharpopulation.txt', delimiter='\t')

问题在于这两条线:

16   E. Champaran          514119    270968   243151      4812     2518     2294      13.0       0.1     20.6     34.3
17   W. Champaran          434714    228057   206657     44912    23135    21777      14.3       1.5     22.3     24.1

如果你能以某种方式删除 E. Champaran 和 W. Champaran 之间的空间,那么你可以这样做:

df = pd.read_csv('test.csv', sep=r'\s+', skip_blank_lines=True, skipinitialspace=True)
print(df)

    SlNo     District    Total    Male  Female  Total.1  Male.1  Female.1    SC    ST  SC.1  ST.1
0      1        Patna   729988  386991  342997     9236    5352      3884  15.5  0.20  38.6  68.7
1      2      Nalanda   473786  248246  225540      970     524       446  20.2  0.00  29.4  29.8
2      3      Bhojpur   343598  181372  162226     8337    4457      3880  15.3  0.40  39.1  46.7
3      4        Buxar   198014  104761   93253     8428    4573      3855  14.1  0.60  37.9  44.6
4      5       Rohtas   444333  233512  210821    25663   13479     12184  18.1  1.00  41.3  30.0
5      6       Kaimur   286291  151031  135260    35662   18639     17023  22.2  2.80  40.5  38.6
6      7         Gaya  1029675  529230  500445     2945    1526      1419  29.6  0.10  26.3  49.1
7      8    Jehanabad   174738   90485   84253     1019     530       489  18.9  0.07  32.6  32.4
8      9       Arawal    11479   57677   53802      294     179       115  18.8  0.04   NaN   NaN
9     10       Nawada   435975  223929  212046     2158    1123      1035  24.1  0.10  22.4  20.5
10    11   Aurangabad   472766  244761  228005     1640     865       775  23.5  0.10  35.7  49.7
11    12        Saran   389933  199772  190161     6667    3384      3283  12.0  0.20  33.6  48.5
12    13        Siwan   309013  153558  155455    13822    6856      6966  11.4  0.50  35.6  44.0
13    14    Gopalganj   267250  134796  132454     6157    2984      3173  12.4  0.30  32.1  37.8
14    15  Muzaffarpur   594577  308894  285683     3472    1789      1683  15.9  0.10  28.9  50.4
15    16  E.Champaran   514119  270968  243151     4812    2518      2294  13.0  0.10  20.6  34.3
16    17  W.Champaran   434714  228057  206657    44912   23135     21777  14.3  1.50  22.3  24.1
17    18    Sitamarhi   315646  166607  149039     1786     952       834  11.8  0.10  22.1  31.4
18    19      Sheohar    74391   39405   34986       64      35        29  14.4  0.00  16.9  38.8
19    20     Vaishali   562123  292711  269412     3068    1595      1473  20.7  0.10  29.4  29.9
20    21    Darbhanga   511125  266236  244889      841     467       374  15.5  0.00  24.7  49.5
21    22    Madhubani   481922  248774  233148     1260     647       613  13.5  0.00  22.2  35.8
22    23   Samastipur   628838  325101  303737     3362    2724       638  18.5  0.10  25.1  22.0
23    24       Munger   150947   80031   70916    18060    9297      8763  13.3  1.60  42.6  37.3
24    25    Begusarai   341173  177897  163276     1505     823       682  14.5  0.10  31.4  78.6
25    26   Shekhapura   103732   54327   49405      211     115        96  19.7  0.00  25.2  45.6
26    27   Lakhisarai   126575   65781   60794     5636    2918      2718  15.8  0.70  26.8  12.9
27    28        Jamui   242710  124538  118172    67357   34689     32668  17.4  4.80  24.5  26.7

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM