简体   繁体   English

使用正则表达式 python 删除德语中的女性结尾

[英]Remove feminine ending in german using regex python

In german language feminine endings are ['/innen','/in','/Innen','/In','Innen','In','innen'] .在德语中,女性词尾是['/innen','/in','/Innen','/In','Innen','In','innen'] I want to remove them from the strings, that are in list.我想将它们从列表中的字符串中删除。

I have come up with the following:我想出了以下几点:

rm_gender = ['/innen','/in','/Innen','/In','Innen','In','innen']
test_list = ['Softwareentwickler',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Softwareentwickler',
 'Softwareentwickler',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Softwareentwickler',
 'Softwareentwickler',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Hard-Softwareentwickler',
 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
 'Hard-Softwareentwickler',
 'Hard-Softwareentwickler',
 'Hard-Softwareentwickler']

result = [vac if any([substring in vac for substring in ['-In',' In']]) else re.sub('|'.join(rm_gender),'',vac) if vac[:2] not in 'In' else 'In' + re.sub('|'.join(rm_gender),'',vac) for vac in test_list]

But it doesn't work, because there is a space in front of words like 'SoftwareentwicklerInnen'.但这不起作用,因为在“SoftwareentwicklerInnen”之类的词前面有一个空格。 How can i correctly do it with regex?我怎样才能用正则表达式正确地做到这一点?

Important is: i want to keep format of the string as it is.重要的是:我想保持字符串的格式不变。 Just need to remove feminine ending( or I want to return corrected list of strings)只需要删除女性结尾(或者我想返回更正的字符串列表)

Try this one:试试这个:

import re

test_list = test_list[0].split(";")
test_list.append("Informatikerin") # adding one ending with in - I don't know if this is a correct word!

pattern = re.compile("in(?:nen)?$", re.IGNORECASE)

[re.sub(pattern, "", x) for x in test_list]

OUTPUT OUTPUT

['Data Scientists', ' DWH-BI Consultants', ' Softwareentwickler', ' Informatiker', ' Statistiker', 'Informatiker']

FOLLOW UP跟进

If you want to rebuild the string as it was, jusr rejoin by ";":如果您想按原样重建字符串,请 jusr 通过“;”重新加入:

";".join([re.sub(pattern, "", x) for x in test_list])

OUTPUT OUTPUT

'Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker;Informatiker'

If the idea is to match all the words in each line:如果想法是匹配每行中的所有单词:

pattern = re.compile("(in(?:nen)?)(?=;|\.|,|;| |:|$)", re.IGNORECASE)

re.sub(pattern, "", "You are a Softwareentwicklerinnen: that is as nice as Informatikerin")
re.sub(pattern, "", "You are a Softwareentwicklerinnen; that is as nice as Informatikerin") 

OUTPUT OUTPUT

'You are a Softwareentwickler: that is as nice as Informatiker'
'You are a Softwareentwickler; that is as nice as Informatiker'

You could convert matches of the following regular expression to empty strings:您可以将以下正则表达式的匹配项转换为空字符串:

\/?[Ii](?:nnen|n)\b

Demo演示

This regex can be broken down as follows.这个正则表达式可以分解如下。

\/?         # optionally match '/'
[Ii]        # match 'I' or 'i'
(?:nnen|n)  # match 'nnen' or 'n' (in that order)
\b          # match a word boundary

The word boundary is to prevent matches of strings such as `innenantenne'单词边界是为了防止字符串匹配,例如 `innenantenne'

You can use您可以使用

rm_gender_regex = re.compile( r'(?:\b/|\B)i(?:nne)?n\b', re.I )
result = [rm_gender_regex.sub('', vac) for vac in test_list]

See the regex demo .请参阅正则表达式演示 Details :详情

  • (?:\b/|\B) - either a / that is preceded with a word char or a position that is preceded with a word char (?:\b/|\B) - 前面有一个单词 char 的/或前面有一个单词 char 的 position
  • i - i i - i
  • (?:nne)? - an optional nne substring - 一个可选的nne substring
  • n - a n char n - 一个n字符
  • \b - a word boundary. \b - 单词边界。

See the Python demo :请参阅Python 演示

import re
test_list = ['Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Softwareentwickler', 'Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Softwareentwickler', 'Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',  'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Hard-Softwareentwickler', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Hard-Softwareentwickler', 'Hard-Softwareentwickler', 'Hard-Softwareentwickler']
rm_gender_regex = re.compile( r'(?:\b/|\B)i(?:nne)?n\b', re.I )
result = [rm_gender_regex.sub('', vac) for vac in test_list]
for x in result:
    print(x)

Output: Output:

Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Softwareentwickler
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Softwareentwickler
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Hard-Softwareentwickler
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Hard-Softwareentwickler
Hard-Softwareentwickler
Hard-Softwareentwickler

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM