简体   繁体   English

从 python 中多行的字符串中提取

[英]Extract from string with multiple lines in python

I have been trying to extract the particular substring from multiple lines in Python.我一直在尝试从 Python 中的多行中提取特定的 substring。

The string goes for 400 lines...with foreign characters as well(for instance Chinese)该字符串长达 400 行......也有外来字符(例如中文)

The example is my_string = '''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n'''示例是my_string = '''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n''' my_string = '''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n''' my_string = '''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n''' all the way to 400. Born to the purple: 出身显赫. my_string = '''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n'''一路到400. Born to the purple: 出身显赫.

What I want to do: Extract only the English part and put them in a list我想做的:只提取英文部分并将它们放在一个列表中

[Up in the air, Out of the woods, Not all there]

Here is my way of doing it.这是我的做法。

import re
my_list = re.split('\:.*\n',my_string[1:])

for line in my_list[-1]:
olist = re.sub('\d.','',line)
print (olist) 

Is that possible to do this in one line?有可能在一行中做到这一点吗?

Thank you谢谢

" ".join(re.findall(r'[a-zA-Z]+', my_string))

> 'Up in the air Out of the woods Not all there'

If you wanted 3 elements in your list (or 400 with your full input):如果您想要列表中的 3 个元素(或完整输入的 400 个元素):

re.findall(r"\d\. (.*):", my_string)

Gives:给出:

['Up in the air', 'Out of the woods', 'Not all there']

if you want the English word in a list you can do something like this:如果您想要列表中的英文单词,您可以执行以下操作:

import re
line='''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n'''
line = re.sub(r"[^A-Za-z\s]", "", line.strip())
words = line.split()
eng_list=[]
for word in words:
    eng_list.append(word)
print(eng_list)

OUTPUT: OUTPUT:

['Up', 'in', 'the', 'air', 'Out', 'of', 'the', 'woods', 'Not', 'all', 'there']

Else if you want the eng word in a single string and in list than you can go for this:否则,如果您希望在单个字符串和列表中使用 eng 单词,则可以使用 go :

import re
line='''\n1. Up in the air: 悬而未决\n2. Out of the woods: 摆脱困境\n3. Not all there: 智商掉线\n'''
line = re.sub(r"[^A-Za-z\s]", "", line.strip())
words = line.split()
eng_list=[]
letter=''
for word in words:
    letter+=word+' '
    # eng_list.append(word)
eng_list.append(letter.strip())
print(eng_list)

OUTPUT OUTPUT

['Up in the air Out of the woods Not all there']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM