简体   繁体   English

根据开头和结尾字符python在列表中添加行

[英]Appending lines in list based on beginning and ending characters python

I have a list that contains sentences which end and begin with different words. 我有一个列表,其中包含以不同词开头和结尾的句子。

I want to achieve the following: 我要实现以下目标:

  1. If a line starts and ends with <p> , append to new list 如果一行以<p>开头和结尾,请追加到新列表
  2. If line begins with <p> but doesn't end with <p> , append to a temporary string and check the next line. 如果行以<p>开头但不以<p>结尾,请追加到临时字符串并检查下一行。 If the next line doesn't end with <p> , append it to temporary string until you get to a line that ends with <p> 如果下一行不是以<p>结尾,请将其附加到临时字符串,直到到达以<p>结尾的行
  3. Refresh temporary string and repeat Steps 1 and 2. 刷新临时字符串并重复步骤1和2。

Working list: 工作清单:

['<p>University Press, Inc.',
'The Game of Hearts: Harriette Wilson &amp; Her Memoirs edited by Lesley Blanch. Copyright © 1955 by<p>',
'<p>7<p>',
'<p>Acknowledgments<p>',
'<p>First, I would like to thank Anna Biller for her countless contributions to',
'this book: the research, the many discussions, her invaluable help with the',
'text itself, and, last but not least, her knowledge of the art of seduction, of',
'which I have been the happy victim on numerous occasions.<p>',
'<p>To the memory of my father<p>',
'<p>8<p>',
'<p>I must thank my mother, Laurette, for supporting me so steadfastly',
'throughout this project and for being my most devoted fan.`<p>`',
'<p>I would like to thank Catherine Léouzon, who some years ago intro-',
'duced me to Les Liaisons Dangereuses and the world of Valmont.<p>']

Working code: 工作代码:

itext = []
tempS = ''
for i in range(len(gtext)):
    if gtext[i][:3] == '<p>' and gtext[i][-3:] == '<p>':
        itext.append(gtext[i])
    elif gtext[i][:3] == '<p>' and gtext[i][-3:] != '<p>':
        tempS += gtext[i]
        if gtext[i+1][-3:] != '<p>':
            tempS += ' ' + gtext[i+1]
            if gtext[i+1][-3:] == '<p>':
                tempS += ' ' + gtext[i+1]
                itext.append(tempS)
                tempS = ''

Expected Result: 预期结果:

['<p>University Press, Inc. The Game of Hearts: Harriette Wilson &amp; Her Memoirs edited by Lesley Blanch. Copyright © 1955 by<p>',
'<p>7<p>',
'<p>Acknowledgments<p>',
'<p>First, I would like to thank Anna Biller for her countless contributions to this book: the research, the many discussions, her invaluable help with the text itself, and, last but not least, her knowledge of the art of seduction, of which I have been the happy victim on numerous occasions.<p>',
'<p>To the memory of my father<p>',
'<p>8<p>',
'<p>I must thank my mother, Laurette, for supporting me so steadfastly throughout this project and for being my most devoted fan.`<p>`',
'<p>I would like to thank Catherine Léouzon, who some years ago intro-duced me to Les Liaisons Dangereuses and the world of Valmont.<p>']

I know it's trivial and seems easy, but I'm short on time and I need a quick fix. 我知道这很简单,而且看起来很简单,但是我的时间很短,我需要快速修复。 Thanks 谢谢

Start out with a list, and append or concat based on a condition. 从列表开始,然后根据条件追加或合并。 A temporary string is not needed: 不需要临时字符串:

workingList = ... #assume its a list of strings. If its not just split it by newlines.
result = []
for i in workingList:
    if '<p>' == i[:3]: result.append(i) #start new if <p> found as start
    else: result[-1] += ' ' + i #add it to the end of the last one

for i in result:
    print(i)

And you get these results when the code is run: 运行代码后,您会得到以下结果:

<p>University Press, Inc.The Game of Hearts: Harriette Wilson &amp; Her Memoirs edited by Lesley Blanch. Copyright © 1955 by<p>
<p>7<p>
<p>Acknowledgments<p>
<p>First, I would like to thank Anna Biller for her countless contributions tothis book: the research, the many discussions, her invaluable help with thetext itself, and, last but not least, her knowledge of the art of seduction, ofwhich I have been the happy victim on numerous occasions.<p>
<p>To the memory of my father<p>
<p>8<p>
<p>I must thank my mother, Laurette, for supporting me so steadfastlythroughout this project and for being my most devoted fan.`<p>`
<p>I would like to thank Catherine Léouzon, who some years ago intro-duced me to Les Liaisons Dangereuses and the world of Valmont.<p>

This can also be accomplished with itertools.groupby : 这也可以通过itertools.groupby完成:

from itertools import groupby

output = []

for test, lines in groupby(gtext, lambda x: x.startswith('<p>') and x.endswith('<p>')):
    if not test:
        output.append(' '.join(list(lines)))
    else:
        output.extend(list(lines))

for line in output:
    print line
# <p>University Press, Inc. The Game of Hearts: Harriette Wilson &amp; Her Memoirs edited by Lesley Blanch. Copyright © 1955 by<p>
# <p>7<p>
# <p>Acknowledgments<p>
# <p>First, I would like to thank Anna Biller for her countless contributions to this book: the research, the many discussions, her invaluable help with the text itself, and, last but not least, her knowledge of the art of seduction, of which I have been the happy victim on numerous occasions.<p>
# <p>To the memory of my father<p>
# <p>8<p>
# <p>I must thank my mother, Laurette, for supporting me so steadfastly throughout this project and for being my most devoted fan.`<p>` <p>I would like to thank Catherine Léouzon, who some years ago intro- duced me to Les Liaisons Dangereuses and the world of Valmont.<p>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM