简体   繁体   English

如何读取以空格和:分隔的文件

[英]How to read file delimited by space and :

My data is of the form : 我的数据的格式为:

1 440:0.033906222568727 730:0.0424739279722748 1523:0.0773048148348295 1893:0.0433930684646909 1440:0.033906222568727 730:0.0424739279722748 1523:0.0773048148348295 1893:0.0433930684646909

1 271:0.0646290650479301 405:0.0653366028581683 584:0.0744087075001463 770:0.0717824200677465 1 271:0.0646290650479301 405:0.0653366028581683 584:0.0744087075001463 770:0.0717824200677465

1 577:0.0679078686536282 761:0.0506946081073312 1 577:0.0679078686536282 761:0.0506946081073312

-1 440:0.0437614564467411 798:0.0370070258333617 831:0.0549176430011721 1681:0.0715035548706038 1963:0.102891965918849 2667:0.0461603813033019 2899:0.0672807783934756 -1 440:0.0437614564467411 798:0.0370070258333617 831:0.0549176430011721 1681:0.0715035548706038 1963:0.102891965918849 2667:0.0461603813033019 2899:0.0672807783934756

I want output in the form of a table: 我想要以表格形式输出:

1 440 0.033906222568727 ......
1 271 0.0646290650479301 ...... 
1 271 0.0646290650479301 ......
1 577 0.0679078686536282 .........

I have tried using 我尝试使用

 x = pd.read_csv('rcv1_train.binary', sep = "\s+|:",  engine = 'python')

and got an error: 并得到一个错误:

pandas.errors.ParserError: Expected 413 fields in line 134, saw 419. Error could possibly be due to quotes being ignored when a multi-char delimiter is used. pandas.errors.ParserError:预期在134行的413个字段中看到419。错误可能是由于使用多字符定界符时引号被忽略。

You probably have bad data in line 134 第134行中的数据可能有误

try using error_bad_lines=False . 尝试使用error_bad_lines=False

x = pd.read_csv('rcv1_train.binary', sep = "\s+|:",  engine = 'python', error_bad_lines=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM