简体   繁体   English

Pandas 正则表达式:将名称与以单词或字符串开头并以某些单词结尾的字符串分开

[英]Pandas Regex: Separate name from string that starts with word or start of string, and ends in certain words

I have a pandas series that contains rows of share names amongst other details:我有一个 pandas 系列,其中包含共享名称行以及其他详细信息:

Netflix DIVIDEND
Apple Inc (All Sessions) COMM
Intel Corporation CONS
Correction Netflix Section 31 Fee

I'm trying to use a regex to retrieve the stock name, which I did with this look ahead:我正在尝试使用正则表达式来检索股票名称,我对此进行了展望:

transactions_df["Share Name"] = transactions_df["MarketName"].str.extract(r"(^.*?(?=DIVIDEND|\(All|CONS|COMM|Section))")

The only thing I'm having trouble with is the row Correction Netflix Section 31 Fee , where my regex is getting the sharename as Correction Netflix .我唯一遇到的问题是Correction Netflix Section 31 Fee行,我的正则表达式将共享名作为Correction Netflix I don't want the word "Correction".我不想要“更正”这个词。

I need my regular expression to check for either the start of the string, OR the word "Correction ".我需要我的正则表达式来检查字符串的开头或单词“Correction”。

I tried a few things, such as an OR |我尝试了一些东西,例如 OR | with the start of string character ^ .以字符串字符^开头。 I also tried a look behind to check for ^ or Correction but the error says they need to be constant length.我还尝试向后看以检查^Correction ,但错误表明它们需要是恒定的长度。

r"((^|Correction ).*?(?=DIVIDEND|\(All|CONS|COMM|Section))"

gives an error;给出错误; ValueError: Wrong number of items passed 2, placement implies 1 . ValueError: Wrong number of items passed 2, placement implies 1 I'm new to regex so I don't really know what this means.我是正则表达式的新手,所以我真的不知道这意味着什么。

You could use an optional part, and in instead of lookarounds use a capture group with a match:您可以使用一个可选部分,而不是环顾四周使用一个匹配的捕获组:

^(?:Correction\s*)?(\S.*?)\s*(?:\([^()]*\)|DIVIDEND|All|CONS|COMM|Section)
  • ^ Start of string ^字符串开头
  • (?:Correction\s*)?
  • (\S.*?)\s* Capture in group 1 , matching a non whitespace char and as least chars as possible and match (not capture) 0+ whitespace chars (\S.*?)\s*组 1中捕获,匹配非空白字符和尽可能少的字符并匹配(不捕获)0+ 空白字符
  • (?: Non capture group for the alternation | (?:交替的非捕获组|
    • \([^()]*\) Match from ( till ) \([^()]*\)()匹配
    • | Or或者
    • DIVIDEND|All|CONS|COMM|Section Match any of the words DIVIDEND|All|CONS|COMM|Section匹配任何单词
  • ) Close group )关闭组

Regex demo正则表达式演示

data = ["Netflix DIVIDEND", "Apple Inc (All Sessions) COMM", "Intel Corporation CONS", "Correction Netflix Section 31 Fee"]
pattern = r"^(?:Correction\s*)?(\S.*?)\s*(?:\([^()]*\)|DIVIDEND|All|CONS|COMM|Section)"
transactions_df = pd.DataFrame(data, columns = ['MarketName'])
transactions_df["Share Name"] = transactions_df["MarketName"].str.extract(pattern)
print(transactions_df)

Output Output

0                   Netflix DIVIDEND            Netflix
1      Apple Inc (All Sessions) COMM          Apple Inc
2             Intel Corporation CONS  Intel Corporation
3  Correction Netflix Section 31 Fee            Netflix

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查找字符串是否以相同的单词开头和结尾 - find whether the string starts and ends with the same word 如何写正则表达式以特定的字符串开始和结束? - how to write regex starts and ends with particular string? 熊猫:如何从另一列中删除出现在某个单词之前的字符串中的单词 - Pandas: How to remove words in string which appear before a certain word from another column 如果Python中字符串以“*”结尾和开头 - If string ends and starts with "*" in Python 删除以特定字符串开头的每个单词 - Remove every word that starts with a certain string Python正则表达式如何删除以 - 开头并以逗号结尾的句子末尾的字符串? - Python regex how to remove string at the end of sentence that starts with - and ends with a comma? 熊猫:读取以特定字符串开头的跳过行 - Pandas: Read skipping lines that starts with a certain string Python - 正则表达式搜索以给定文本开头和结尾的字符串 - Python - regex search for string which starts and ends with the given text 正则表达式以一行中的CAPITAL词开始和结束,在CAPITAL单行词中的多行 - Regex starts and ends with CAPITAL word in a line, several lines amid CAPITAL single-line words Python 正则表达式如何找到以给定单词开头并以两个单词之一结尾的 substring - Python regex how to find a substring that starts with a given word and ends with either of two words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM