繁体   English   中英

正则表达式找不到特定的十六进制字符对

[英]Regex not finding specific pair of hexadecimal character

python 3.7.4

我有一个 *.csv 包含许多字符串实例

High School

以及许多十六进制对的实例

C3 82

我想删除。

def findem( fn, patt):
  p = re.compile(patt)
  with open( fn, newline = '\n') as fp:
    for line in fp.readlines():
      m = p.search( line)
      if( m):
        print('found {0}'.format(line))

fn_inn = "Contacts_prod.csv"

patt_hs   = "High School"
patt_C382 = r'\\xC3\\x82'

print('trying patt_hs')
findem( fn_inn, patt_hs)    # <------- finds all rows containing High School, great

print('trying patt_C382')
findem( fn_inn, patt_C382)  # <------- doesnt find anything and should

正如所写的那样,它应该打印出哪些行包含该模式。 使用patt = "High School"一切都按预期工作。 使用patt = r'\xc3\x82'什么都找不到。

有任何想法吗?

诀窍是 1) 放弃寻找和显示每个事件的想法,并记住目标是删除所有事件和 2) 以二进制的方式思考。 然后它变得简单,但有一些微妙之处:

def findem( patt):
  p = re.compile(patt)
  with open( fn_out, 'wb') as fp_out:   #binary input
    with open( fn_inn, 'rb') as fp_inn: #binary output
      slurp_i = fp_inn.read()           # slurp_i is of type bytes
      slurp_o = p.sub( b'', slurp_i)    # notice the b'' , very subtle
      fp_out.write( slurp_o)

fn_inn = "Contacts_prod.csv"
fn_out = "Contacts_prod.fixed.dat"

patt = re.compile(b'\xC3\x82')         # notice the b'' instead of r'', very subtle
findem( patt)

感谢所有回复。 万岁!

仍在学习的史蒂夫

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM