[英]Regex not finding specific pair of hexadecimal character
python 3.7.4
我有一個 *.csv 包含許多字符串實例
High School
以及許多十六進制對的實例
C3 82
我想刪除。
def findem( fn, patt):
p = re.compile(patt)
with open( fn, newline = '\n') as fp:
for line in fp.readlines():
m = p.search( line)
if( m):
print('found {0}'.format(line))
fn_inn = "Contacts_prod.csv"
patt_hs = "High School"
patt_C382 = r'\\xC3\\x82'
print('trying patt_hs')
findem( fn_inn, patt_hs) # <------- finds all rows containing High School, great
print('trying patt_C382')
findem( fn_inn, patt_C382) # <------- doesnt find anything and should
正如所寫的那樣,它應該打印出哪些行包含該模式。 使用patt
= "High School"
一切都按預期工作。 使用patt
= r'\xc3\x82'
什么都找不到。
有任何想法嗎?
訣竅是 1) 放棄尋找和顯示每個事件的想法,並記住目標是刪除所有事件和 2) 以二進制的方式思考。 然后它變得簡單,但有一些微妙之處:
def findem( patt):
p = re.compile(patt)
with open( fn_out, 'wb') as fp_out: #binary input
with open( fn_inn, 'rb') as fp_inn: #binary output
slurp_i = fp_inn.read() # slurp_i is of type bytes
slurp_o = p.sub( b'', slurp_i) # notice the b'' , very subtle
fp_out.write( slurp_o)
fn_inn = "Contacts_prod.csv"
fn_out = "Contacts_prod.fixed.dat"
patt = re.compile(b'\xC3\x82') # notice the b'' instead of r'', very subtle
findem( patt)
感謝所有回復。 萬歲!
仍在學習的史蒂夫
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.