簡體   English   中英

將段落中句子的每個首字母大寫

[英]Capitalize each first word of a sentence in a paragraph

我想在整個句子(str)的整個段落(str)中將第一個單詞的首字母大寫。 問題是所有字符都是小寫。

我嘗試過這樣的事情:

text = "here a long. paragraph full of sentences. what in this case does not work. i am lost" 
re.sub(r'(\b\. )([a-zA-z])', r'\1' (r'\2').upper(), text) 

我期望這樣的事情:

“很長。一段充滿句子。這種情況下不起作用。我迷路了。”

您可以將re.sublambda一起使用:

import re
text = "here a long. paragraph full of sentences. what in this case does not work. i am lost" 
result = re.sub('(?<=^)\w|(?<=\.\s)\w', lambda x:x.group().upper(), text)

輸出:

'Here a long. Paragraph full of sentences. What in this case does not work. I am lost'

正則表達式說明:

(?<=^)\\w :匹配在行首之前的字母數字字符。

(?<=\\.\\s)\\w :匹配字母數字字符,其后帶有句點和空格。

您可以使用((?:^|\\.\\s)\\s*)([az])正則表達式( 它不依賴於周圍環境,有時您可能正在使用的regex方言中可能不提供這種環視,因此更簡單例如,盡管EcmaScript2018中支持Java腳本,但Java尚不廣泛支持lookbehind。但是您可以在句子開頭捕獲零個或多個開頭的空白,或在其后捕獲一個或多個空白。用文字點表示. 並在group1中捕獲它,然后使用([az])捕獲一個小寫字母,並在group2中捕獲,並使用lambda表達式將匹配的文本替換為group1捕獲的文本和group2捕獲的字母。 檢查此Python代碼,

import re

arr = ['here a long.   paragraph full of sentences. what in this case does not work. i am lost',
       '   this para contains more than one space after period and also has unneeded space at the start of string.   here a long.   paragraph full of sentences.  what in this case does not work. i am lost']

for s in arr:
    print(re.sub(r'(^\s*|\.\s+)([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))

輸出,

Here a long.   Paragraph full of sentences. What in this case does not work. I am lost
   This para contains more than one space after period and also has unneeded space at the start of string.   Here a long.   Paragraph full of sentences.  What in this case does not work. I am lost

並且如果您想擺脫多余的空格並將其減少為一個空格,只需將\\s*從group1中取出並使用此正則表達式((?:^|\\.\\s))\\s*([az])和更新的Python代碼,

import re

arr = ['here a long.   paragraph full of sentences. what in this case does not work. i am lost',
       '   this para contains more than one space after period and also has unneeded space at the start of string.   here a long.   paragraph full of sentences.  what in this case does not work. i am lost']

for s in arr:
    print(re.sub(r'((?:^|\.\s))\s*([a-z])', lambda m: m.group(1) + m.group(2).upper(), s))

您會發現,通常需要將多余的空格減少到只有一個空格,

Here a long. Paragraph full of sentences. What in this case does not work. I am lost
This para contains more than one space after period and also has unneeded space at the start of string. Here a long. Paragraph full of sentences. What in this case does not work. I am lost

另外,如果要使用基於PCRE的正則表達式引擎來完成此操作,則可以在正則表達式本身中使用\\U ,而不必使用lambda函數,而只需將其替換為\\1\\U\\2

基於PCRE的正則表達式的正則表達式演示

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM