简体   繁体   English

从 pandas dataframe 中的字符串中获取不同格式的日期

[英]Fetching date in different formats from string in pandas dataframe

below is the text from which i want to fetch the dates in different format.下面是我想从中获取不同格式日期的文本。

"Sales Assistant @ DFS Duration - June 2021 - 2023 Currently working in XYZ Within the role I am expected to achieve sales targets which I currently have no problems reaching. Job Role/Establishment - Plasterer @ XX Plasterer's Duration - September 2016 - Nov 2016 A Job Role/Establishment - Customer Advisor @ AA Duration - (2015 – 2016) Job Role/Establishment - Warehouse Operative @ xyz Duration - 03/2014 to 08/2015 In the xyz warehouse Job Role/Establishment - Airport Terminal Assistant @ port Duration - 01/2012 - 06/2013 Working at the airport. Job Role/Establishment - Apprentice Floorer @ YY Floors Duration - DEC 2010 to APRIL 2012 a" (12/03/2020)-(2/11/2021) Fetch dates with different formats @TEST Duration - (December- March2022) and thsi is test @BLA Duration - (July-December 2019) - This is test trying to fetch dates with diff formats - 05/22 - 2023 . “销售助理 @ DFS 持续时间 - 2021 年 6 月 - 2023 年目前在 XYZ 工作 在该职位内,我有望实现销售目标,我目前可以轻松实现。工作角色/机构 - 泥水匠 @ XX 泥水匠的持续时间 - 2016 年 9 月 - 2016 年 11 月A工作角色/机构 - 客户顾问 @ AA 持续时间 - (2015 – 2016)工作角色/机构 - 仓库操作员 @ xyz 持续时间 - 03/2014 至 08/2015在 xyz 仓库 工作角色/机构 - 机场航站楼助理 @ 港口 持续时间 - 01/2012 - 06/2013在机场工作。工作角色/机构 - 学徒 Floorer @ YY Floors 持续时间 - 2010 年 12 月至 2012 年 4 月a" (12/03/2020)-(2/11/2021)获取日期不同formats @TEST Duration - (December- March2022)和 thsi 是测试 @BLA Duration - (July-December 2019) - 这是尝试使用差异格式获取日期的测试 - 05/22 - 2023 @ Plasterer's Duration - 10/21 - 05/22 16-17 other starts from 31 september 2022 to 01 january 2023 towards ends it starts from july 2022 - january 2023 . @Plasterer 的持续时间 - 10/21 - 05/22 16-17 其他从2022 年 9 月 31 日到 2023 年 1 月 1 日结束,从2022 年 7 月到 2023 年 1月结束。

This is the regex logic check here to see regex pattern that matches with the majority of date format but still misses the date in 31 september 2022 to 01 january 2023 this format.这是此处的正则表达式逻辑检查,以查看与大多数日期格式匹配但仍然错过2022 年 9 月 31 日至 2023 年 1 月 1 日这种格式的日期的正则表达式模式。 Also current pattern fetches 16-17 which is not required当前模式也获取 16-17,这不是必需的

\(?(?:\b[A-Za-z]{3,9}\s*)?(?:\d\d?\/){0,2}[12]\d(?:\d{2})?\)?\s*(?:–|-|[Tt][Oo])\s*\(?(?:[A-Za-z]{3,9}\s*)?(?:\d\d?\/){0,2}[12]\d(?:\d{2})?\)?|\(\s*[A-Za-z]{3,9}\s*[--]\s*[A-Za-z]{3,9}\s*[12]\d{3}\s*\)

what changes needs to be made.?需要做哪些改变。? any leads.?任何线索。? or any other efficient way to fetch the same..?或任何其他获取相同内容的有效方法..?

You can make the alternatives more specific and use a case insensitive match:您可以使备选方案更具体并使用不区分大小写的匹配项:

\(\d\d?/\d\d?/\d{4}\)\s*[-–]\s*\(\d\d?/\d\d?/\d{4}\)|\((?:[A-Za-z]{3,9}|\d{4})\s*[-–]\s*(?:[A-Za-z]{3,9})?\s*\d{4}\)|\b\d\d?\s+[A-Za-z]{3,9}\s*[-–]\s*\d{4}\s+to\s+\d\d?\s+[A-Za-z]{3,9}\s+\d{4}\b|\b[A-Za-z]{3,9}\s+\d{4}\s*(?:[-–]|to)(?:\s*[A-Za-z]{3,9})?\s+\d{4}|\b\d\d?/(?:\d{4}|\d\d?)\s+(?:to|[-–])\s+(?:\d\d?/)?(?:\d{4}|\d\d?)\b

It is a long pattern, but these are the 5 alternatives with a description.这是一个很长的模式,但这些是带有描述的 5 个备选方案。

  • \(\d\d?/\d\d?/\d{4}\)\s*[-–]\s*\(\d\d?/\d\d?/\d{4}\) Match (...)-(...) with digits and / as separator \(\d\d?/\d\d?/\d{4}\)\s*[-–]\s*\(\d\d?/\d\d?/\d{4}\)匹配(...)-(...)与数字和/作为分隔符
  • | Or或者
  • \((?:[A-Za-z]{3,9}|\d{4})\s*[-–]\s*(?:[A-Za-z]{3,9})?\s*\d{4}\) Match (...)-(...) with leading chars az or 4 digits, then - followed by optional chars az and then 4 digits \((?:[A-Za-z]{3,9}|\d{4})\s*[-–]\s*(?:[A-Za-z]{3,9})?\s*\d{4}\)(...)-(...)与前导字符 az 或 4 位数字匹配,然后-后跟可选字符 az 和 4 位数字
  • | Or或者
  • \b\d\d?\s+[A-Za-z]{3,9}\s*[-–]\s*\d{4}\s+to\s+\d\d?\s+[A-Za-z]{3,9}\s+\d{4}\b Match digits and chars az with - and to \b\d\d?\s+[A-Za-z]{3,9}\s*[-–]\s*\d{4}\s+to\s+\d\d?\s+[A-Za-z]{3,9}\s+\d{4}\b将数字和字符 az 与-to匹配
  • | Or或者
  • \b[A-Za-z]{3,9}\s+\d{4}\s*(?:[-–]|to)(?:\s*[A-Za-z]{3,9})?\s+\d{4} Match chars az with 4 digits, to or - and optional chars followed by 4 digits \b[A-Za-z]{3,9}\s+\d{4}\s*(?:[-–]|to)(?:\s*[A-Za-z]{3,9})?\s+\d{4}将字符 az 与 4 位数字to-以及后跟 4 位数字的可选字符匹配
  • | Or或者
  • \b\d\d?/(?:\d{4}|\d\d?)\s+(?:to|[-–])\s+(?:\d\d?/)?(?:\d{4}|\d\d?)\b Match 1 or 2 digits followed by / and 1,2 or 4 digits. \b\d\d?/(?:\d{4}|\d\d?)\s+(?:to|[-–])\s+(?:\d\d?/)?(?:\d{4}|\d\d?)\b匹配 1 或 2 个数字后跟/和 1,2 或 4 个数字。 Then match to or - and again 1-2 digits / and 1,2 or 4 digits然后匹配to or -并再次匹配 1-2 位数字/和 1,2 或 4 位数字

See a regex demo .请参阅正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM