简体   繁体   English

正则表达式找不到特定的十六进制字符对

[英]Regex not finding specific pair of hexadecimal character

python 3.7.4 python 3.7.4

I've a *.csv that contains numerous instances of the character string我有一个 *.csv 包含许多字符串实例

High School

and numerous instances of the hexadecimal-pair以及许多十六进制对的实例

C3 82

which I'd like remove.我想删除。

def findem( fn, patt):
  p = re.compile(patt)
  with open( fn, newline = '\n') as fp:
    for line in fp.readlines():
      m = p.search( line)
      if( m):
        print('found {0}'.format(line))

fn_inn = "Contacts_prod.csv"

patt_hs   = "High School"
patt_C382 = r'\\xC3\\x82'

print('trying patt_hs')
findem( fn_inn, patt_hs)    # <------- finds all rows containing High School, great

print('trying patt_C382')
findem( fn_inn, patt_C382)  # <------- doesnt find anything and should

As written it should print out which rows contain the pattern.正如所写的那样,它应该打印出哪些行包含该模式。 With patt = "High School" everything works as expected.使用patt = "High School"一切都按预期工作。 With patt = r'\xc3\x82' nothing gets found.使用patt = r'\xc3\x82'什么都找不到。

Any ideas?有任何想法吗?

The trick was to 1) quit thinking in terms of finding and displaying each occurrence and remember the goal is to remove all occurrences and 2) think in terms of binary.诀窍是 1) 放弃寻找和显示每个事件的想法,并记住目标是删除所有事件和 2) 以二进制的方式思考。 Then it became simple, but with some subtleties:然后它变得简单,但有一些微妙之处:

def findem( patt):
  p = re.compile(patt)
  with open( fn_out, 'wb') as fp_out:   #binary input
    with open( fn_inn, 'rb') as fp_inn: #binary output
      slurp_i = fp_inn.read()           # slurp_i is of type bytes
      slurp_o = p.sub( b'', slurp_i)    # notice the b'' , very subtle
      fp_out.write( slurp_o)

fn_inn = "Contacts_prod.csv"
fn_out = "Contacts_prod.fixed.dat"

patt = re.compile(b'\xC3\x82')         # notice the b'' instead of r'', very subtle
findem( patt)

Thanks to all that responded.感谢所有回复。 All Hail SO!万岁!

Still-learning Steve仍在学习的史蒂夫

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM