Python deleting from text specific type of words

Question

I would like to delete from whatsapp chat.txt file all the dates, username and emoticon. The file looks like this:

10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat

It is possible to write a script in python that recognizes the username and dates deleting it. Leaving only the chat text? I immagine i should use regex expression and maybe convert all the text to a string.

Please help

Answer 1

Similar question about regex and Whatsapp logs with python

Regex to match whatsapp chat log

Code from the first answer


^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)

Answer 2

A super simple way here would be to iterate line by line and split on : . If we can assume that the date, time - username: message will always follow this format, we can grab everything after the second :

text = '''10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat
10/4/19, 7:18 PM - user1: example chat
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat'''

for message in text.split('\n'):
    print(message.split(':')[2:][0])

Outputs

 example chat
 😂
 example chat
 example chat
 😂
 example chat

Answer 3

Another way is to build a regexp for that. Emoji regexp taken from here

import re

str_in = """10/4/19, 7:18 PM - user1: example chat 
            10/4/19, 7:18 PM - user2: 😂  
            10/4/19, 7:18 PM - user3: example chat  
            10/4/19, 7:18 PM - user1: example chat  
            10/4/19, 7:18 PM - user2: 😂  
            10/4/19, 7:18 PM - user3: example chat"""

dates_filtered = re.sub(r'(\d+\/\d+\/\d+, \d+:\d+ [AP]M - [ \d\w]+: )', '', str_in)

regrex_pattern = re.compile(pattern = "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags = re.UNICODE)
emoji_filtered = regrex_pattern.sub(r'',dates_filtered)


blank_lines_filtered = re.sub(r'(\n\s*\n)', '\n', emoji_filtered)

print(str_in)
print('---------')
print(dates_filtered)
print('---------')
print(emoji_filtered)
print('---------')
print(blank_lines_filtered)

prints

10/4/19, 7:18 PM - user1: example chat 
10/4/19, 7:18 PM - user2: 😂
10/4/19, 7:18 PM - user3: example chat 
10/4/19, 7:18 PM - user1: example chat 
10/4/19, 7:18 PM - user2: 😂 
10/4/19, 7:18 PM - user3: example chat
---------
example chat 
😂
example chat 
example chat 
😂 
example chat
---------
example chat
              
example chat
example chat 

example chat
--------- 
example chat
example chat
example chat 
example chat
---------

Answer 4

You can also use list comprehension:

print([ message.split(':')[2:][0] for message in text.split('\n') ])

Answer 5

here

`sentence='10/4/19, 7:18 PM - user1: example chat 10/4/19, 7:18 PM - user2: 10/4/19, 7:18 PM - user3: example chat 10/4/19, 7:18 PM - user1: example chat 10/4/19, 7:18 PM - user2: 10/4/19, 7:18 PM - user3: example chat'

chat=re.findall('-\suser\d:\s([a-zA-Z\d]|.*?) \d', sentence)

print(chat)`

output:

['example chat', '😂', 'example chat', 'example chat', '😂']

Python deleting from text specific type of words

Question

5 answers

solution1
0 2021-05-13 10:50:10

solution2
0 2021-05-13 10:50:53

solution3
0 ACCPTED 2021-05-13 10:54:21

solution4
0 2021-05-13 10:58:05

solution5
0 2021-05-13 11:02:32

Python deleting from text specific type of words

Question

5 answers

solution1 0 2021-05-13 10:50:10

solution2 0 2021-05-13 10:50:53

solution3 0 ACCPTED 2021-05-13 10:54:21

solution4 0 2021-05-13 10:58:05

solution5 0 2021-05-13 11:02:32

solution1
0 2021-05-13 10:50:10

solution2
0 2021-05-13 10:50:53

solution3
0 ACCPTED 2021-05-13 10:54:21

solution4
0 2021-05-13 10:58:05

solution5
0 2021-05-13 11:02:32