過濾以特定字符串開頭的 txt 行 (Python)

Question

我有一個看起來像這樣的 txt 文件：

...
|J150|DRE.16.2|T|2|DRE.16|PROVISAO P  CSLL|6779,24|D|D||
|J150|DRE.16.2.001|D|3|DRE.16.2|CSLL|6779,24|D|D||
|J150|DRE.17|T|1||LUCRO DO EXERCICIO|55797,1|C|R||
|J005|01012018|31122018|1||
|J100|BP.01|T|1||A|ATIVO|5540527,48|D|8656252,32|D||
|J100|BP.01.1|T|2|BP.01|A|ATIVO CIRCULANTE|5030370,68|D|7881200,94|D||
|J100|BP.01.1.1|T|3|BP.01.1|A|DISPONIBILIDADES|380741,7|D|777224,63|D||
|J100|BP.01.1.1.01|T|4|BP.01.1.1|A|CAIXA|96786,62|D|69935,41|D||
|J100|BP.01.1.1.01.001|D|5|BP.01.1.1.01|A|Caixa|96786,62|D|69935,41|D||
...

它很長。 我想在一個新文件中只分離以“|J100|”開頭的行。 我在這里嘗試了一些答案，但在我的情況下不起作用。 在我的試驗下面：

path="file.txt"
open('newfile','w').writelines([ line for line in open(path) if '|J100|' in line])

沒用，得到UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 255: invalid start byte

然后我嘗試了這個：

with open(path,'rb') as f,open('new.txt','wb') as g:
    g.writelines(filter(lambda line: '|J100|' in line, f))

並得到這個作為響應： TypeError: a bytes-like object is required, not 'str'

有任何想法嗎？

Answer 1

如果

path="file.txt"
open('newfile','w').writelines([ line for line in open(path) if '|J100|' in line])

引發UnicodeDecodeError然后 file.text 的內容未編碼為 UTF-8。

這段代碼

with open(path,'rb') as f,open('new.txt','wb') as g:
    g.writelines(filter(lambda line: '|J100|' in line, f))

引發TypeError因為您正在以二進制模式讀取文件，因此其內容為 output 作為bytes ，但 lambda 將這些字節與字符串值（ '|J100|' ）進行比較。 最好的方法是將字節與字節（ b'|J100|' ）進行比較。 此外，如果您只想要以特定值開頭的行，請使用bytes.startswith過濾包含 |J100| 的行。 開始后：

with open(path,'rb') as f,open('new.txt','wb') as g:
    g.writelines(filter(lambda line: line.startswith(b'|J100|'), f))

過濾以特定字符串開頭的 txt 行 (Python)

問題描述

1 個解決方案

解決方案1
0 2019-11-03 08:50:15

過濾以特定字符串開頭的 txt 行 (Python)

問題描述

1 個解決方案

解決方案1 0 2019-11-03 08:50:15

解決方案1
0 2019-11-03 08:50:15