简体   繁体   English

使用python pandas读取已分成多行的行

[英]Read a row that has been split in multiple lines using python pandas

Reading a data that has discrepancies in it like few rows have been split into multiple rows(starting row 6). 读取其中存在差异的数据,例如几行已被拆分为多行(从第6行开始)。 Below is the data and code that i have, can you guys help me with that? 下面是我的数据和代码,你们可以帮助我吗?

data: 数据:

MP|3560039|||L000011396|BTA171|30071: PHILLIPS, MT|4253|I|103278|||D|1
MP|3561042|||WQTI544|BEA148|16023: BUTTE, ID|2891|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16077: POWER, ID|7817|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16011: BINGHAM, ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK, ID|82839|I|103306|||D|1
MP|3561250|||WQTI576
|BEA135|48301: LOVING, TX|82|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48443: TERRELL, TX|984|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48173: GLASSCOCK, TX|1226|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48243: JEFF DAVIS, TX|2342|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48461: UPTON, TX|3355|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48383: REAGAN, TX|3367|I|103308|||D|1

Code: 码:

df4_mk = pd.read_csv(zf1.open('MP.dat'),header=None,delimiter='|', index_col=0, names=['record_type', 'unique_system_identifier', 'uls_file_number','ebf_number','call_sign',
                           'market_partition_code','defined_partition_area','defined_area_population','include_exclude_ind','partition_sequence_area_id',
                           'action_performed','census_figures','def_undef_ind','partition_sequence_number'],low_memory=False,
                           dtype={'record_type':str,'unique_system_identifier':int,'uls_file_number':str,'ebf_number':str,'call_sign': str,
                                  'market_partition_code':str,'defined_partition_area':str,'defined_area_population':int,'include_exclude_ind':str,
                                  'partition_sequence_area_id':int,'action_performed': str,'census_figures': int,'def_undef_ind': str,'partition_sequence_number':int })

I'd replace '|\\n' with '|' 我用'|'替换'|\\n' '|' using string manipulation ( replace ): 使用字符串操作( replace ):

In [11]: s = open('MP.dat').read()

In [12]: print(s.replace("\n|", "|"))
MP|3560039|||L000011396|BTA171|30071: PHILLIPS, MT|4253|I|103278|||D|1
MP|3561042|||WQTI544|BEA148|16023: BUTTE, ID|2891|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16077: POWER, ID|7817|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16011: BINGHAM, ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK, ID|82839|I|103306|||D|1
MP|3561250|||WQTI576|BEA135|48301: LOVING, TX|82|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48443: TERRELL, TX|984|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48173: GLASSCOCK, TX|1226|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48243: JEFF DAVIS, TX|2342|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48461: UPTON, TX|3355|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48383: REAGAN, TX|3367|I|103308|||D|1

In [13]: from io import StringIO
    ...: pd.read_csv(StringIO(s.replace("\n|", "|")), delimiter='|', header=None) # plus other args

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM