简体   繁体   English

python搜索替换使用通配符

[英]python search replace using wildcards

somewhat confused.. but trying to do a search/repace using wildcards 有点困惑..但尝试使用通配符进行搜索/重新调用

if i have something like: 如果我有类似的东西:

 <blah.... ssf  ff>
 <bl.... ssf     dfggg   ff>
 <b.... ssf      ghhjj fhf>

and i want to replace all of the above strings with say, 我想用以下方法替换所有上述字符串,

 <hh  >t

any thoughts/comments on how this can be accomplished? 关于如何实现这一点的任何想法/意见?

thanks 谢谢

update (thanks for the comments!) 更新(感谢您的评论!)

i'm missing something... 我错过了一些东西......

my initial sample text are: 我的初始示例文本是:

Soo Choi</span>LONGEDITBOX">Apryl Berney 
Soo Choi</span>LONGEDITBOX">Joel Franks 
Joel Franks</span>GEDITBOX">Alexander Yamato 

and i'm trying to get 而且我想要得到

Soo Choi foo Apryl Berney 
Soo Choi foo Joel Franks 
Joel Franks foo Alexander Yamato 

i've tried derivations of 我试过推算

name=re.sub("</s[^>]*\">"," foo ",name) 

but i'm missing something... 但我错过了一些东西......

thoughts... thanks 想法......谢谢

How about like this, with regex 这个怎么样,正则表达式

import re

YOURTEXT=re.sub("<b[^>]*>","<hh >t",YOURTEXT)

请参阅此处相当有用的Python 正则表达式手册,或者参见正则表达式HOWTO部分5.2搜索和替换的更多动手方法。

don't have to use regex 不必使用正则表达式

for line in open("file"):
    if "<" in line and ">" in line:
        s=line.rstrip().split(">")
        for n,i in enumerate(s):
            if "<" in i:
                ind=i.find("<")
                s[n]=i[:ind] +"<hh "
        print '>t'.join(s)

output 产量

$ cat file
blah  <blah.... ssf  ff> blah
blah <bl.... ssf     dfggg   ff>  blah <bl.... ssf     dfggg   ff>
blah <b.... ssf      ghhjj fhf>

$ ./python.py
blah  <hh >t blah
blah <hh >t  blah <hh >t
blah <hh >t

Sounds like a job for the "re" module, here's a little sample function for you although you could just use the one re.sub() line. 听起来像“re”模块的工作,这里有一个小样本函数,虽然你可以使用一个re.sub()行。

Use the "re" module, a simple re.sub should do the trick: 使用“re”模块,一个简单的re.sub应该可以做到这一点:

import re

def subit(msg):
    # Use the below if the string is multiline
    # subbed = re.compile("(<.*?>)" re.DOTALL).sub("(<hh  >t", msg)
    subbed = re.sub("(<.*?>)", "<hh  >t", msg)
    return subbed

# Your messages bundled into a list
msgs = ["blah  <blah.... ssf  ff> blah",
        "blah <bl.... ssf     dfggg   ff>  blah <bl.... ssf     dfggg   ff>",
        "blah <b.... ssf      ghhjj fhf>"]

# Iterate the messages and print the substitution results
for msg in msgs:
    print subit(msg)

I would suggest taking a look at the docs for the "re" module, it is well documented and might help you achieve more accurate text manipulation/replacement. 我建议看看“re”模块的文档,它有很好的文档记录,可能会帮助您实现更准确的文本操作/替换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM