繁体   English   中英

在 Pandas 中删除正则表达式匹配前后的所有内容

[英]Deleting everything before and after regex match in Pandas

我有一个 dataframe,其中一列中有大量文本。 它看起来像这样:


[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}

我希望能够删除不在与第 59 分钟匹配的行中的所有内容,并将该行保留在它所在的位置,这样该列将只显示:


{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '16200976821350'}}

我尝试使用正则表达式,但似乎无法正常工作。

/({'occupancy_state': '[1],[0-9]+,[0-9]+,[0-9]+', 'rtc_utc_time': '[2][0][0-9][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:*59:[0-9][0-9]', 'sdcard_site': '[a-zA-Z]+', 'sdcard_chain': '[a-zA-Z]+', 'sdcard_line': '[0-9]+,[a-zA-Z]+', 'utc': {'.numberLong': '[0-9]+'}})/g

任何帮助表示赞赏

使用以下代码,您可以在 59 分钟内完成 select 部词典:

[i for i in myList if i['rtc_utc_time'].partition(":")[2][:2] == '59']

注意:您提供的输入在59分钟内不包含任何dictionary

使用列表理解和拆分来查找 rtc_utc_time 上的分钟

data=[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}},  
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}]

   
[print (row) for row in data if row["rtc_utc_time"].split(':')[1]!= '59']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM