[英]Python parsing file line by line with special characters
When reading in a file line-by-line using the classic for line in filename:
approach, how do you concatenate each line into one string (or one string per list) based on a specific character symbol (eg, $
).当使用经典的
for line in filename:
方法逐行读取文件时,如何根据特定的字符符号(例如$
)将每一行连接成一个字符串(或每个列表一个字符串)。 For example:例如:
My input:我的输入:
$asdfasdfasdfasdfasdfasdf
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
$aWEOUUEWOEUowuerotueworutowueortuo
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs
My desired output:我想要的输出:
'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF'
'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJSLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs'
OR或者
['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF']
['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJSLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs']
Notice that any lines beginning with the $
symbol were removed and used as the breaking point of string concatenation line-by-line.请注意,所有以
$
符号开头的行都被删除并用作逐行字符串连接的断点。
You can use regex.您可以使用正则表达式。
re.finditer
returns an iterator containing all the desired lines, then you can use a list comprehension and str.replace
method to replace the newlines with empty string: re.finditer
返回一个包含所有所需行的迭代器,然后您可以使用列表re.finditer
和str.replace
方法用空字符串替换换行符:
>>> s="""$asdfasdfasdfasdfasdfasdf
... ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
... LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
... LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
... ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
... $aWEOUUEWOEUowuerotueworutowueortuo
... ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
... LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
... LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
... ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs
... """
>>>
>>> import re
>>>
>>> li=re.finditer(r'\$[^\n]*([^$]+)',s)
>>> [i.group(1).replace('\n','') for i in li]
['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF',
'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs']
import io
data = io.StringIO('''$asdfasdfasdfasdfasdfasdf
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
$aWEOUUEWOEUowuerotueworutowueortuo
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs''')
strings = []
strg = ''
for line in data:
if line.startswith('$'):
if strg:
strings.append(strg)
strg = ''
continue
else:
strg += line.strip()
if strg:
strings.append(strg)
print(strings)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.