[英]Deleting everything before and after regex match in Pandas
我有一個 dataframe,其中一列中有大量文本。 它看起來像這樣:
[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}
我希望能夠刪除不在與第 59 分鍾匹配的行中的所有內容,並將該行保留在它所在的位置,這樣該列將只顯示:
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '16200976821350'}}
我嘗試使用正則表達式,但似乎無法正常工作。
/({'occupancy_state': '[1],[0-9]+,[0-9]+,[0-9]+', 'rtc_utc_time': '[2][0][0-9][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:*59:[0-9][0-9]', 'sdcard_site': '[a-zA-Z]+', 'sdcard_chain': '[a-zA-Z]+', 'sdcard_line': '[0-9]+,[a-zA-Z]+', 'utc': {'.numberLong': '[0-9]+'}})/g
任何幫助表示贊賞
使用以下代碼,您可以在 59 分鍾內完成 select 部詞典:
[i for i in myList if i['rtc_utc_time'].partition(":")[2][:2] == '59']
注意:您提供的輸入在59
分鍾內不包含任何dictionary
。
使用列表理解和拆分來查找 rtc_utc_time 上的分鍾
data=[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}]
[print (row) for row in data if row["rtc_utc_time"].split(':')[1]!= '59']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.