簡體   English   中英

尋找正則表達式從數據框中刪除可預測的文本塊

[英]Looking to Regex Strip a predictable chunk of text from data frame

我有一個檢驗結果和違規數據框架,如下所示:

Results                 Violations
Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E

Pass                    36. THERMOMETERS PROVIDED & ACCURATE Comment...

我需要做的是讓Python遍歷此熊貓數據框,特別是在“違規”列中,並標識“以數字開頭,以注釋結尾:”的所有情況。

我能夠使用正則表達式使用此行代碼剝離數字

df_new['Violations'] = df_new['Violations'].map(lambda x: 
    x.lstrip('0123456789.- ').rstrip('[^a-zA-Z]Comments[^a-zA-Z]'))

如您所見,我試圖通過rstrip regex命令來實現注釋的結尾,但這似乎沒有任何作用。 輸出看起來像這樣

Results Violations
0   Pass w/ Conditions  MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPL...
1   Pass    THERMOMETERS PROVIDED & ACCURATE - Comments: 4...

regex命令的基本含義是:查找數字並刪除數字和注釋之間的所有內容:

有沒有簡單的方法可以做到這一點?

regex命令的基本含義是:查找數字並刪除數字和注釋之間的所有內容:

foo = '''\
Results                 Violations
Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass                    36. THERMOMETERS PROVIDED & ACCURATE Comment...'''


>>> print(foo)
    Results                 Violations
    Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
    Pass                    36. THERMOMETERS PROVIDED & ACCURATE Comment...
>>>


import re
bar = re.sub('(\d+\.).*(Comment.*)', '\\1', foo)


>>> print(bar)
    Results                 Violations
    Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
    Pass                    36.
>>>

參考文獻:

字符串中子字符串的最后一次出現

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM