简体   繁体   English

Pandas read_csv具有不均匀的行长度作为标题

[英]Pandas read_csv with uneven length of rows as header

There is a txt file with uneven length 有一个长度不均匀的txt文件

LS-DYNA user input                                                      
                         ls-dyna mpp.78769 s              date 01/02/2013

 constraint #      axial        shear         time  failure                                       length  rslt moment      torsion
       1720  8.39282E-01  6.55466E-01  1.20000E+03      0.0    spotweld beam  ID     938970  4.47325E+01  2.24041E+00
       1721  3.30134E-01  5.08016E-01  1.20000E+03      0.0    spotweld beam  ID     938971  4.47310E+01  1.70857E+00
       1722  9.52039E-01  2.24977E+00  1.20000E+03      0.0    spotweld beam  ID     938972  3.50040E+00  1.14531E+01
       1723  1.37947E+00  3.75614E+00  1.20000E+03      0.0    spotweld beam  ID     938973  2.99986E+00  3.72429E+01
       1724 -1.29900E+00  8.59783E-01  1.20000E+03      0.0    spotweld beam  ID     938974  3.50112E+00  1.11357E+01
       1725 -1.39978E+00  5.05035E+00  1.20000E+03      0.0    spotweld beam  ID     938975  2.99934E+00  1.69379E+01
       1726 -8.28811E-01  2.36767E+00  1.20000E+03      0.0    spotweld beam  ID     938976  3.50022E+00  1.01569E+01
       1727 -8.02390E-01  2.83158E+00  1.20000E+03      0.0    spotweld beam  ID     938977  2.99945E+00  5.26153E+01
       1728  2.45994E+01  2.55278E+02  1.20000E+03      0.0    spotweld beam  ID     938978  3.51565E+00  1.03888E+01
       1729  3.79365E+01  1.91420E+01  1.20000E+03      0.0    spotweld beam  ID     938978  2.99987E+00  8.96939E+00

Without resorting to skiprows , as the rows without data would change in different cases, I am trying to read the file by 没有求助于skiprows ,因为没有数据的行会在不同的情况下发生变化,我试图通过读取文件

pd.read_csv(File, header=None, delim_whitespace=True)

It would throw me an error with 它会给我一个错误

pandas.parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 2, saw 5

Then I redefine the pandas parameters, as 然后我重新定义了pandas参数,如

my_cols = ['A', 'B', 'C', 'D', 'E','F','G']
elout= pd.read_csv(File, names=my_cols, header=None, delim_whitespace=True)

There would be no issue. 没有问题。 Except this dumb way, is there any other settings I could resort to solve this issue? 除了这种愚蠢的方式,我还有其他设置可以解决这个问题吗?

Thank you! 谢谢!

If you don't want to use skiprows , an alternative is to open the file yourself like f = open(File) . 如果您不想使用skiprows ,另一种方法是自己打开文件,如f = open(File) Then you f.readline() and parse manually the first lines that are not of interest for you. 然后你f.readline()并手动解析你不感兴趣的第一行。 Once you extract the useful parts of the header through f and the file pointer reached the beginning of the table, simply pass f to read_csv as first argument, and pandas will start processing the data from that point. 一旦通过f提取头部的有用部分并且文件指针到达表的开头,只需将f传递给read_csv作为第一个参数,并且pandas将开始处理该点的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM