[英]Remove feminine ending in german using regex python
In german language feminine endings are ['/innen','/in','/Innen','/In','Innen','In','innen']
.在德语中,女性词尾是['/innen','/in','/Innen','/In','Innen','In','innen']
。 I want to remove them from the strings, that are in list.我想将它们从列表中的字符串中删除。
I have come up with the following:我想出了以下几点:
rm_gender = ['/innen','/in','/Innen','/In','Innen','In','innen']
test_list = ['Softwareentwickler',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Softwareentwickler',
'Softwareentwickler',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Softwareentwickler',
'Softwareentwickler',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Hard-Softwareentwickler',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Hard-Softwareentwickler',
'Hard-Softwareentwickler',
'Hard-Softwareentwickler']
result = [vac if any([substring in vac for substring in ['-In',' In']]) else re.sub('|'.join(rm_gender),'',vac) if vac[:2] not in 'In' else 'In' + re.sub('|'.join(rm_gender),'',vac) for vac in test_list]
But it doesn't work, because there is a space in front of words like 'SoftwareentwicklerInnen'.但这不起作用,因为在“SoftwareentwicklerInnen”之类的词前面有一个空格。 How can i correctly do it with regex?我怎样才能用正则表达式正确地做到这一点?
Important is: i want to keep format of the string as it is.重要的是:我想保持字符串的格式不变。 Just need to remove feminine ending( or I want to return corrected list of strings)只需要删除女性结尾(或者我想返回更正的字符串列表)
Try this one:试试这个:
import re
test_list = test_list[0].split(";")
test_list.append("Informatikerin") # adding one ending with in - I don't know if this is a correct word!
pattern = re.compile("in(?:nen)?$", re.IGNORECASE)
[re.sub(pattern, "", x) for x in test_list]
OUTPUT OUTPUT
['Data Scientists', ' DWH-BI Consultants', ' Softwareentwickler', ' Informatiker', ' Statistiker', 'Informatiker']
FOLLOW UP跟进
If you want to rebuild the string as it was, jusr rejoin by ";":如果您想按原样重建字符串,请 jusr 通过“;”重新加入:
";".join([re.sub(pattern, "", x) for x in test_list])
OUTPUT OUTPUT
'Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker;Informatiker'
If the idea is to match all the words in each line:如果想法是匹配每行中的所有单词:
pattern = re.compile("(in(?:nen)?)(?=;|\.|,|;| |:|$)", re.IGNORECASE)
re.sub(pattern, "", "You are a Softwareentwicklerinnen: that is as nice as Informatikerin")
re.sub(pattern, "", "You are a Softwareentwicklerinnen; that is as nice as Informatikerin")
OUTPUT OUTPUT
'You are a Softwareentwickler: that is as nice as Informatiker'
'You are a Softwareentwickler; that is as nice as Informatiker'
You could convert matches of the following regular expression to empty strings:您可以将以下正则表达式的匹配项转换为空字符串:
\/?[Ii](?:nnen|n)\b
This regex can be broken down as follows.这个正则表达式可以分解如下。
\/? # optionally match '/'
[Ii] # match 'I' or 'i'
(?:nnen|n) # match 'nnen' or 'n' (in that order)
\b # match a word boundary
The word boundary is to prevent matches of strings such as `innenantenne'单词边界是为了防止字符串匹配,例如 `innenantenne'
You can use您可以使用
rm_gender_regex = re.compile( r'(?:\b/|\B)i(?:nne)?n\b', re.I )
result = [rm_gender_regex.sub('', vac) for vac in test_list]
See the regex demo .请参阅正则表达式演示。 Details :详情:
(?:\b/|\B)
- either a /
that is preceded with a word char or a position that is preceded with a word char (?:\b/|\B)
- 前面有一个单词 char 的/
或前面有一个单词 char 的 positioni
- i
i
- i
(?:nne)?
- an optional nne
substring - 一个可选的nne
substringn
- a n
char n
- 一个n
字符\b
- a word boundary. \b
- 单词边界。See the Python demo :请参阅Python 演示:
import re
test_list = ['Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Softwareentwickler', 'Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Softwareentwickler', 'Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Hard-Softwareentwickler', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Hard-Softwareentwickler', 'Hard-Softwareentwickler', 'Hard-Softwareentwickler']
rm_gender_regex = re.compile( r'(?:\b/|\B)i(?:nne)?n\b', re.I )
result = [rm_gender_regex.sub('', vac) for vac in test_list]
for x in result:
print(x)
Output: Output:
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Softwareentwickler
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Softwareentwickler
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Hard-Softwareentwickler
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Hard-Softwareentwickler
Hard-Softwareentwickler
Hard-Softwareentwickler
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.