简体   繁体   中英

How to remove \n1, \n2, \n3 etc. from a string in python list?

I have a python list question_text_list created containing string of characters (texts) retrieved from a csv file

['text1, 'text2...'text100000']

One of the texts in the list looks like below

'in star trek 2013 why did they \n\nspoilers\nspoilers\nspoilers\nspoilers\n\n1make warping look quite a bit like an hyperspace jump\n2what in the world were those bright particles as soon as they jumped\n3why in the world did they make it possible for two entities to react in warp space in separate jumps\n4why did spock get emotions for this movie\n5what was the point of hiding the enterprise underwater\n6when they were intercepted by the dark ship how come they reached earth when they were far away from heri dont seem to remember the scene where they warp to earth\n7how did the ship enter earths atmosphere when it wasnt even in orbit\n8when scotty opened the door of the black ship how come pike and khan didnt slow down'

I applied the following command hoping i could remove \n1, \n2..\n8..and also \nspoilers

    question_text_list = [x.replace('\n*',' ').replace('\nspoilers','') for x in question_text_list]

I am getting the following output which is not desirable since i still see \n1, \n2 removing \n but not the trailing numbers like '1','2'

'in star trek 2013 why did they 1make warping look quite a bit like an hyperspace jump2what in the world were those bright particles as soon as they jumped3why in the world did they make it possible for two entities to react in warp space in separate jumps4why did spock get emotions for this movie5what was the point of hiding the enterprise underwater6when they were intercepted by the dark ship how come they reached earth when they were far away from heri dont seem to remember the scene where they warp to earth7how did the ship enter earths atmosphere when it wasnt even in orbit8when scotty opened the door of the black ship how come pike and khan didnt slow down'

Question - How can i remove all the newline characters with trailing numbers like \n1,\n2... in Python?

A simple regex will do the trick:

import re 

text = 'in star trek 2013 why did they \n\nspoilers ...' # leaving out for brevity
article = re.sub(r'\n[0-9]?(spoilers)?', '', x)

The regex \n[0-9]?(spoilers)? says:

\n => match \n

[0-9]? => match any number 0 through 9, but it doesn't have to exist (the ? part)

(spoilers)? => match the whole word spoilers , but it doesn't have to exist

You should use Regular Expressions for this:

assuming your variable is called text, you should do the following:

import re
text = re.sub(r'\n\d', ' ', text).replace("\nspoilers","").replace("\n","")

this will remove at first all the \nNumbers so the \n1 \n2 etc... and the second replace will simply remove the \nspoilers and the third will remove any unwanted \n. the result will be like this:

'in star trek 2013 why did they  make warping look quite a bit like an hyperspace jump what in the world were those bright particles as soon as they jumped why in the world did they make it possible for two entities to react in warp space in separate jumps why did spock get emotions for this movie what was the point of hiding the enterprise underwater when they were intercepted by the dark ship how come they reached earth when they were far away from heri dont seem to remember the scene where they warp to earth how did the ship enter earths atmosphere when it wasnt even in orbit when scotty opened the door of the black ship how come pike and khan didnt slow down'

You can use:

li = [...] # your orginal list

li = [item.rstrip('\n') for item in li]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM