简体   繁体   English

如何为从我的whatsapp消息导出的日期时间编写正则表达式代码

[英]how to write the regex code for the datetime exported from my whatsapp messages

I am trying to analyse my whatsapp data, but am running into problems.我正在尝试分析我的 whatsapp 数据,但遇到了问题。 Almost every example I looked at made a variable for the datetime pattern that looks like one of these:几乎我看过的每个示例都为 datetime 模式创建了一个变量,如下所示:

  • pattern = '^([0-9]+)(/)([0-9]+)(/)([0-9][0-9]), ([0-9]+):([0-9][0-9]) (AM|PM) -'模式 = '^([0-9]+)(/)([0-9]+)(/)([0-9][0-9]), ([0-9]+):([ 0-9][0-9])(上午|下午)-'
  • pattern = '^\d{1,2}/\d{1,2}/\d{1,2}, \d{1,2}:\d{1,2}\S [AaPp][Mm] -'模式 = '^\d{1,2}/\d{1,2}/\d{1,2}, \d{1,2}:\d{1,2}\S [AaPp][Mm ]-'
  • pattern = '^([0-9]+)(/)([0-9]+)(/)([0-9]+), ([0-9]+):([0-9]+)[ ]?(AM|PM|am|pm)?模式 = '^([0-9]+)(/)([0-9]+)(/)([0-9]+), ([0-9]+):([0-9] +)[ ]?(上午|下午|上午|下午)? -' -'
  • pattern = '\d{1,2}/\d{1,2}/\d{2,4},\s\d{1,2}:\d{2}\s-\s'模式 = '\d{1,2}/\d{1,2}/\d{2,4},\s\d{1,2}:\d{2}\s-\s'

these patterns are all based on their whatsapp data which as you can see included am and pm a lot of the time.这些模式都是基于他们的whatsapp数据,你可以看到很多时候包括上午和下午。 I happen to live in the Netherlands where we don't use am or pm, so I am lost on how to write the pattern code for this datetime (bold) :我碰巧住在荷兰,我们不使用 am 或 pm,所以我不知道如何为这个日期时间编写模式代码(粗体)

(random whatsapp message below) (以下随机whatsapp消息)

[31-08-2020 17:41:12] Eva Zandbergen: Dit krijgt meneer niet binnen [31-08-2020 17:41:12] Eva Zandbergen: Dit krijgt meneer niet binnen

I have tried every line of code I could find on the internet, but none of them worked.我已经尝试了我在互联网上可以找到的每一行代码,但没有一个有效。 If someone could tell me how to write the pattern code for this datetime, they would actually be my hero.如果有人能告诉我如何为这个日期时间编写模式代码,他们实际上就是我的英雄。

below I have added screenshots of the notebook file I am working in. Hopefully it can give some insight into my problems:下面我添加了我正在使用的笔记本文件的屏幕截图。希望它可以深入了解我的问题:

my notebook file我的笔记本文件

error 1错误 1

eror 2错误 2

Something like this should work:像这样的东西应该工作:

(\[[0-3]{1}[0-9]{1}-[0-1]{1}[0-9]{1}-\d{4}\s[[0-2]{1}[0-9]{1}:[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}\])

I've put it in regex101.com and it pulls it out ok... also gives you the explanation of what it is doing.我把它放在了regex101.com中,它把它拉出来了……还给你解释了它在做什么。

It would probably be worth you running a few additional cases in the example string to ensure you dont get any rogues.您可能值得在示例字符串中运行一些额外的案例,以确保您不会遇到任何流氓。

Its quite simple, and I did it quite quickly so it only handles the basic dates - it wont filter out days that months dont have that others do, but it will filter out anything over the 31st of any month, or anything over 12 months)它很简单,而且我做得很快,所以它只处理基本日期 - 它不会过滤掉其他月份没有的日子,但它会过滤掉任何一个月的 31 日或超过 12 个月的任何东西)


Update: I've transcribed your notebook into my own and used a very small random dataset to do what your notebook is doing and it seems to extract them out ok, I've hosted this on github here: https://github.com/danstreeter/stackoverflow-70643023更新:我已经将你的笔记本转录成我自己的,并使用一个非常小的随机数据集来完成你的笔记本正在做的事情,它似乎可以将它们提取出来,我已经在 github 上托管了这个: https://github.com /danstreeter/stackoverflow-70643023


Another thought on this, but if you know the inbound data is well formatted, and will always be well formatted - which you may be able to expect from something like a WhatsApp message export: is rather than looking for dates by regex, just slice each message string as an array and getting the 1st to 21st characters:对此的另一个想法,但如果您知道入站数据格式正确,并且始终格式正确 - 您可能会从 WhatsApp 消息导出之类的东西中得到预期:不是通过正则表达式查找日期,而是切片每个消息字符串作为数组并获取第 1 到第 21 个字符:

with open('gezin.txt', 'r') as opened_file:
    for line in opened_file:
        print(line[0:21])

Obviously this is very dependant on the data being well formed, and the date ALWAYS appearing as the first 21 characters of each line - but a possibility if these things are true.显然,这非常依赖于格式良好的数据,并且日期始终显示为每行的前 21 个字符 - 但如果这些事情是真的,则有可能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM