I would like to delete from whatsapp chat.txt file all the dates, username and emoticon. The file looks like this:
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
It is possible to write a script in python that recognizes the username and dates deleting it. Leaving only the chat text? I immagine i should use regex expression and maybe convert all the text to a string.
Please help
Similar question about regex and Whatsapp logs with python
Regex to match whatsapp chat log
Code from the first answer
^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)
A super simple way here would be to iterate line by line and split on :
. If we can assume that the date, time - username: message
will always follow this format, we can grab everything after the second :
text = '''10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat'''
for message in text.split('\n'):
print(message.split(':')[2:][0])
Outputs
example chat
😂
example chat
example chat
😂
example chat
Another way is to build a regexp for that. Emoji regexp taken from here
import re
str_in = """10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat"""
dates_filtered = re.sub(r'(\d+\/\d+\/\d+, \d+:\d+ [AP]M - [ \d\w]+: )', '', str_in)
regrex_pattern = re.compile(pattern = "["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"]+", flags = re.UNICODE)
emoji_filtered = regrex_pattern.sub(r'',dates_filtered)
blank_lines_filtered = re.sub(r'(\n\s*\n)', '\n', emoji_filtered)
print(str_in)
print('---------')
print(dates_filtered)
print('---------')
print(emoji_filtered)
print('---------')
print(blank_lines_filtered)
prints
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
---------
example chat
😂
example chat
example chat
😂
example chat
---------
example chat
example chat
example chat
example chat
---------
example chat
example chat
example chat
example chat
---------
You can also use list comprehension:
print([ message.split(':')[2:][0] for message in text.split('\n') ])
here
`sentence='10/4/19, 7:18 PM - user1: example chat 10/4/19, 7:18 PM - user2: 10/4/19, 7:18 PM - user3: example chat 10/4/19, 7:18 PM - user1: example chat 10/4/19, 7:18 PM - user2: 10/4/19, 7:18 PM - user3: example chat'
chat=re.findall('-\suser\d:\s([a-zA-Z\d]|.*?) \d', sentence)
print(chat)`
output:
['example chat', '😂', 'example chat', 'example chat', '😂']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.