简体   繁体   English

Python:替换多个正则表达式

[英]Python: Replace multiple Regex Expression

In the following input, I am trying to replace the numbers and \\n with '' and ' ' respectively.在以下输入中,我试图分别用''' '替换数字和\\n

THE SONNETS\n\n                    1\n\nFrom fairest creatures we desire increase,\nThat thereby beauty’s rose might never die,\nBut as the riper should by time decease,\nHis

she hies,             1189\nAnd yokes her silver doves; by whose swift aid\nTheir mistress mounted through the empty skies,\nIn her light chariot quickly is convey’d;           1192\n  Holding their course to Paphos, where their queen\n  Means to immure herself and not be seen.\n'

The input_var is read from a file that has above content. input_var是从具有上述内容的文件中读取的。

file_name = 'sample.txt'
file = open(folder+file_name, mode='r', encoding='utf8')
input_var = file.read()
file.close

The screenshot of file is attached.附上文件截图。 在此处输入图片说明

The data in file is文件中的数据是

THE SONNETS

                    1

From fairest creatures we desire increase,
That thereby beauty’s rose might never die,
But as the riper should by time decease,
His

she hies,             1189
And yokes her silver doves; by whose swift aid
Their mistress mounted through the empty skies,
In her light chariot quickly is convey’d;           1192
  Holding their course to Paphos, where their queen
  Means to immure herself and not be seen.

For identifying numbers I have the used the regex [\\s]{3,}\\d{1,}\\\\n (there have to be at least 3 spaces before the number. (see this link for testing of regex).为了识别数字,我使用了正则表达式[\\s]{3,}\\d{1,}\\\\n (数字前必须至少有 3 个空格。(请参阅此链接以测试正则表达式)。

I am using the following code to replace the regular expression and \\n both that I have got from a few answers in stackoverflow.我正在使用以下代码来替换正则表达式和\\n我从 stackoverflow 中的一些答案中得到的两者。

Code 1 -代码 1 -

# Remove the numbers in sonnets and at the end of lines
pattern = {r'[\s]{3,}\d{1,}\\n' : '',
           r'\\n' : ' '
          }

regex = re.compile('|'.join(map(re.escape, pattern.keys(  ))))
output_var = regex.sub(lambda match: pattern[match.group(0)], input_var)

Code 2 -代码 2 -

rep = dict((re.escape(k), v) for k, v in pattern.items())
pattern_test = re.compile("|".join(rep.keys()))
output_var = pattern_test.sub(lambda m: rep[re.escape(m.group(0))], input_var)

Code 3 -代码 3 -

for i, j in pattern.items():
        output_var = input_var.replace(i, j)

where input_var has the above mentioned text.其中input_var具有上述文本。 All three do not replace anything.这三个都不能代替任何东西。

I have also tried我也试过

pattern = {r'[\s]{3,}\d{1,}\n' : '',
           r'\n' : ' '
          }

but it does not replace anything.但它不能取代任何东西。

pattern = {'[\s]{3,}\d{1,}\n' : '',
           '\n' : ' '
          }

replaces only \\n and the output is like只替换\\n并且输出就像

THE SONNETS                      1  From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His

The regular expression is not identified in the dictionary and it is, I think, being taken as literal string rather than regular expression.正则表达式未在字典中标识,我认为它被视为文字字符串而不是正则表达式。 How can I specify the regular expression in the dictionary?如何在字典中指定正则表达式? The answers I have found in stackoverflow use strings rather than regular expression like the answer provided for this question .我在 stackoverflow 中找到的答案使用字符串而不是正则表达式,就像为此问题提供的答案一样。

The expected outcome is预期的结果是

THE SONNETS                       From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His

    she hies,And yokes her silver doves; by whose swift aid  Their mistress mounted through the empty skies, In her light chariot quickly is convey’d;  Holding their course to Paphos, where their queen   Means to immure herself and not be seen. ' 

Here's a bit of a working example that you could run (if you have bs4 etc.).这是您可以运行的一些工作示例(如果您有 bs4 等)。 I see you're getting help on the numbering and regex but this may help understand the line returns etc. (not exactly sure what the goal is).我看到您在编号和正则表达式方面获得帮助,但这可能有助于理解行返回等(不确定目标是什么)。 Couldn't find a source on the web with similar number to your source so it's not like-for-like unfortunately.无法在网络上找到与您的来源具有相似编号的来源,因此不幸的是,它不是同类的。 Maybe food for thought if nothing else.如果没有别的,也许值得深思。

from bs4 import BeautifulSoup
import re
import requests


url = 'http://www.gutenberg.org/cache/epub/1041/pg1041.txt'

page = requests.get(url)
# print(page.status_code)
soup = BeautifulSoup(page.text)

sonnet = page.text

print(sonnet[780:1500])
print()
print('------')
print()
sonnet = re.sub('\r','',sonnet)
sonnet = re.sub('\n','',sonnet)
print(sonnet[698:1500])

url2 = 'http://shakespeare.mit.edu/Poetry/VenusAndAdonis.html'

page = requests.get(url2)
# print(page.status_code)
soup = BeautifulSoup(page.text)
print()
print('------')
print('------')
print()
VenusAndAdonis = soup.text
print(type(VenusAndAdonis))
print(VenusAndAdonis[800:1500])
print()
print('------')
print()
VenusAndAdonis = re.sub('\r','',VenusAndAdonis)
VenusAndAdonis = re.sub('\n',' ',VenusAndAdonis)
print(VenusAndAdonis[800:1500])

Outputs:输出:

I

  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou, contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else this glutton be,
    To eat the world's due, by the grave and thee.

  II

  When forty winters shall besiege thy brow,

------

I  From fairest creatures we desire increase,  That thereby beauty's rose might never die,  But as the riper should by time decease,  His tender heir might bear his memory:  But thou, contracted to thine own bright eyes,  Feed'st thy light's flame with self-substantial fuel,  Making a famine where abundance lies,  Thy self thy foe, to thy sweet self too cruel:  Thou that art now the world's fresh ornament,  And only herald to the gaudy spring,  Within thine own bud buriest thy content,  And tender churl mak'st waste in niggarding:    Pity the world, or else this glutton be,    To eat the world's due, by the grave and thee.  II  When forty winters shall besiege thy brow,  And dig deep trenches in thy beauty's field,  Thy youth's proud livery so gazed on now,  Will be a tatter'd weed of small 

------
------

<class 'str'>
 honour to your heart's content; which I
wish may always answer your own wish and the world's hopeful
expectation.
Your honour's in all duty,
WILLIAM SHAKESPEARE.

EVEN as the sun with purple-colour'd face
Had ta'en his last leave of the weeping morn,
Rose-cheek'd Adonis hied him to the chase;
Hunting he loved, but love he laugh'd to scorn;
Sick-thoughted Venus makes amain unto him,
And like a bold-faced suitor 'gins to woo him.


'Thrice-fairer than myself,' thus she began,
'The field's chief flower, sweet above compare,
Stain to all nymphs, more lovely than a man,
More white and red than doves or roses are;
Nature that made thee, with herself at strife,
Saith that the world hath ending wit

------

 honour to your heart's content; which I wish may always answer your own wish and the world's hopeful expectation. Your honour's in all duty, WILLIAM SHAKESPEARE.  EVEN as the sun with purple-colour'd face Had ta'en his last leave of the weeping morn, Rose-cheek'd Adonis hied him to the chase; Hunting he loved, but love he laugh'd to scorn; Sick-thoughted Venus makes amain unto him, And like a bold-faced suitor 'gins to woo him.   'Thrice-fairer than myself,' thus she began, 'The field's chief flower, sweet above compare, Stain to all nymphs, more lovely than a man, More white and red than doves or roses are; Nature that made thee, with herself at strife, Saith that the world hath ending wit

You need to run re.sub s in a loop, but make sure the output_var is initialized the input_var value:您需要在循环中运行re.sub s,但请确保output_var已初始化为input_var值:

output_var = input_var
for reg, repl in pattern.items():
  output_var = re.sub(reg, repl, output_var)

See the Python demo online :在线查看Python 演示

import re

input_var = """THE SONNETS

                    1

From fairest creatures we desire increase,
That thereby beauty’s rose might never die,
But as the riper should by time decease,
His

she hies,             1189
And yokes her silver doves; by whose swift aid
Their mistress mounted through the empty skies,
In her light chariot quickly is convey’d;           1192
  Holding their course to Paphos, where their queen
  Means to immure herself and not be seen."""

pattern = {r'\s{3,}\d+\n' : '',
           r'\n' : ' '}
output_var = input_var
for reg, repl in pattern.items():
  output_var = re.sub(reg, repl, output_var)

print(output_var)

Output:输出:

THE SONNETS From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His  she hies,And yokes her silver doves; by whose swift aid Their mistress mounted through the empty skies, In her light chariot quickly is convey’d;  Holding their course to Paphos, where their queen   Means to immure herself and not be seen.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM