[英]Deleting everything before and after regex match in Pandas
我有一个 dataframe,其中一列中有大量文本。 它看起来像这样:
[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}
我希望能够删除不在与第 59 分钟匹配的行中的所有内容,并将该行保留在它所在的位置,这样该列将只显示:
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '16200976821350'}}
我尝试使用正则表达式,但似乎无法正常工作。
/({'occupancy_state': '[1],[0-9]+,[0-9]+,[0-9]+', 'rtc_utc_time': '[2][0][0-9][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:*59:[0-9][0-9]', 'sdcard_site': '[a-zA-Z]+', 'sdcard_chain': '[a-zA-Z]+', 'sdcard_line': '[0-9]+,[a-zA-Z]+', 'utc': {'.numberLong': '[0-9]+'}})/g
任何帮助表示赞赏
使用以下代码,您可以在 59 分钟内完成 select 部词典:
[i for i in myList if i['rtc_utc_time'].partition(":")[2][:2] == '59']
注意:您提供的输入在59
分钟内不包含任何dictionary
。
使用列表理解和拆分来查找 rtc_utc_time 上的分钟
data=[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}},
{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}]
[print (row) for row in data if row["rtc_utc_time"].split(':')[1]!= '59']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.