简体   繁体   English

Python逐行解析带有特殊字符的文件

[英]Python parsing file line by line with special characters

When reading in a file line-by-line using the classic for line in filename: approach, how do you concatenate each line into one string (or one string per list) based on a specific character symbol (eg, $ ).当使用经典的for line in filename:方法逐行读取文件时,如何根据特定的字符符号(例如$ )将每一行连接成一个字符串(或每个列表一个字符串)。 For example:例如:

My input:我的输入:

$asdfasdfasdfasdfasdfasdf
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
$aWEOUUEWOEUowuerotueworutowueortuo
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs

My desired output:我想要的输出:

'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF'
'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJSLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs'

OR或者

['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF']
['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJSLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs']

Notice that any lines beginning with the $ symbol were removed and used as the breaking point of string concatenation line-by-line.请注意,所有以$符号开头的行都被删除并用作逐行字符串连接的断点。

You can use regex.您可以使用正则表达式。 re.finditer returns an iterator containing all the desired lines, then you can use a list comprehension and str.replace method to replace the newlines with empty string: re.finditer返回一个包含所有所需行的迭代器,然后您可以使用列表re.finditerstr.replace方法用空字符串替换换行符:

>>> s="""$asdfasdfasdfasdfasdfasdf
... ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
... LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
... LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
... ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
... $aWEOUUEWOEUowuerotueworutowueortuo
... ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
... LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
... LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
... ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs
... """
>>> 
>>> import re
>>> 
>>> li=re.finditer(r'\$[^\n]*([^$]+)',s)
>>> [i.group(1).replace('\n','') for i in li]
['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF',
 'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs']
import io

data = io.StringIO('''$asdfasdfasdfasdfasdfasdf
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
$aWEOUUEWOEUowuerotueworutowueortuo
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs''')


strings = []
strg = ''
for line in data:
    if line.startswith('$'):
        if strg:
            strings.append(strg)
            strg = ''
        continue
    else:
        strg += line.strip()
if strg:
    strings.append(strg)   

print(strings)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM