[英]Looking to Regex Strip a predictable chunk of text from data frame
我有一個檢驗結果和違規數據框架,如下所示:
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...
我需要做的是讓Python遍歷此熊貓數據框,特別是在“違規”列中,並標識“以數字開頭,以注釋結尾:”的所有情況。
我能夠使用正則表達式使用此行代碼剝離數字
df_new['Violations'] = df_new['Violations'].map(lambda x:
x.lstrip('0123456789.- ').rstrip('[^a-zA-Z]Comments[^a-zA-Z]'))
如您所見,我試圖通過rstrip regex命令來實現注釋的結尾,但這似乎沒有任何作用。 輸出看起來像這樣
Results Violations
0 Pass w/ Conditions MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPL...
1 Pass THERMOMETERS PROVIDED & ACCURATE - Comments: 4...
regex命令的基本含義是:查找數字並刪除數字和注釋之間的所有內容:
有沒有簡單的方法可以做到這一點?
regex命令的基本含義是:查找數字並刪除數字和注釋之間的所有內容:
foo = '''\
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...'''
>>> print(foo)
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...
>>>
import re
bar = re.sub('(\d+\.).*(Comment.*)', '\\1', foo)
>>> print(bar)
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36.
>>>
參考文獻:
字符串中子字符串的最后一次出現
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.