简体   繁体   English

在 Python 中使用正则表达式从选定文本中搜索和替换

[英]Search and replace from a selected text using regex in Python

I would like to select a text from a file in Python and replace only from the selected phrase until a certain text.我想 select Python 中的文件中的文本,并仅从选定的短语替换到特定文本。

with open ('searchfile.txt', 'r' ) as f:
    content = f.read()
    content_new = re.sub('^\S*', '(.*?\/)', content, flags = re.M)
with open ('searchfile.txt', 'w') as f:
    f.write(content_new)

searchfile.txt contains the below text: searchfile.txt 包含以下文本:

abc/def/efg 212 234 asjakj
hij/klm/mno 213 121 ashasj

My aim is to select everything from the line until the first space and then replace it with the text until the first occurance of backslash /我的目标是 select 从行到第一个空格的所有内容,然后用文本替换它,直到第一次出现反斜杠 /

Example:例子:

^\S* selects everything until the first space in my file which is "abc/def/efg". ^\S*选择所有内容,直到我的文件中的第一个空格是"abc/def/efg".

I would like to replace this text with only "abc" and "hij" in different lines我想用不同行中的"abc"和“hij”替换此文本

My regexp (.*?\/) does not work for me here.我的正则表达式(.*?\/)在这里对我不起作用。

You can split the content with whitespace, get the first item and split it with / and take the first item:您可以使用空格拆分content ,获取第一项并使用/拆分并获取第一项:

content_new = content.split()[0].split('/')[0]

See the Python demo .请参阅Python 演示

If you plan to use a regex, you may use如果您打算使用正则表达式,您可以使用

match = re.search(r'^[^\s/]+', content, flags = re.M)
if match:
    content_new = match.group()

See the Python demo .请参阅Python 演示 Details :详情

  • ^ - start of a line (due to re.M ) ^ - 一行的开头(由于re.M
  • [^\s/]+ - one or more chars other than whitespace and / . [^\s/]+ - 除空格和/之外的一个或多个字符。

Try this:尝试这个:

>>> s = 'abc/def/efg 212 234 asjakj'
>>> p = s.split(' ', maxsplit=1)
>>> p
['abc/def/efg', '212 234 asjakj']
>>> p[0] = p[0].split('/', maxsplit=1)[0]
>>> p
['abc', '212 234 asjakj']
>>> s = ' '.join(p)
>>> s
'abc 212 234 asjakj'

One-liner solution:一线解决方案:

>>> s.replace(s[:s.index(' ')], s[:s.index('/')], 1)
'abc 212 234 asjakj'

May be this can help可能这可以帮助

import re

s = "abc/def/efg 212 234 asjakj"
pattern = r"^(.*?\/)"
replace = "xyz/"
op = re.sub(pattern, replace, s)
print (op)

Rephrased expected behavior改写预期行为

  1. Given a string that has this pattern: <path><space> .给定一个具有此模式的字符串: <path><space>
  2. If the first part of given string ( <path> ) has at least one slash / surrounded by words.如果给定字符串的第一部分 ( <path> ) 至少有一个斜杠/被单词包围。
  3. Then return the string before the slash.然后返回斜杠前的字符串。
  4. Else return empty string.否则返回空字符串。

Where path is words delimited by slashes.其中路径是由斜杠分隔的单词。 For example abc/de .例如abc/de But but not one of those:但不是其中之一:

  • abc
  • /de
  • abc/file.txt
  • abc/

Solution解决方案

Matching lines匹配线

Could also match for the pattern and only extract the first path-element before the slash then.也可以匹配模式,然后只提取斜线之前的第一个路径元素。

import re

line = "abc/def/efg 212 234 asjakj"

extracted = ''  # default
if re.match(r'^(\w+/\w+)+ ', line):
    extracted = line.split('/')[0]  # even simpler than Wiktors split

print(extracted)

Extraction萃取

The extraction can be done in two ways:提取可以通过两种方式完成:

(1) Just the first path-element, like Wiktor answered . (1) 只是第一个路径元素,就像Wiktor 回答的那样。

first_path_element = "abc/def/efg 212 234 asjakj".split('/')[0]
print(first_path_element)

(2) Some may find a regex shorter and more expressive: (2) 有些人可能会发现正则表达式更短且更具表现力:

import re

first_path_element = re.findall(r'^(\w+)/', "abc/def/efg 212 234 asjakj")[0]
print(first_path_element)

Here is a solution which is working for reading from the file, searching a pattern, replacing with a new one and writing into the same file.这是一个解决方案,用于从文件中读取、搜索模式、替换为新模式并写入同一文件。

file_name = ("/home/searchfile.txt")
with open(file_name) as file:
    lines = file.readlines()
result_data = []
for line in lines:
    line = line.strip()
    space_split = line.split(" ")
    prefix = space_split[0].split("/")[0]
    result = prefix + " " + " ".join(space_split[1:])
    result_data.append(result)
with open(file_name, "w") as file:
    lines = file.writelines("\n".join(result_data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM