仅从文本文件中的某些行中删除“\\ n”

Question

I have a text file organized like this: 我有一个像这样组织的文本文件：

NAME: name\n
AGE: age\n
NOTES: random text\n
JOB: text
\n
NAME: name\n
AGE: age\n
NOTES: random text\n
JOB: text
\n

I only wrote 5 lines for each block of data but let's say I have 7 lines or more. 我只为每个数据块写了5行但是假设我有7行或更多行。 I also wrote only 2 blocks here but my file may contain over 100, and my desired output would be a list of lists (preferably): 我这里只写了2个块，但我的文件可能包含100多个，我想要的输出是列表 （最好是）：

list=[[NAME: name\n, AGE: age\n, NOTES: random text\n, JOB: blabla, \n], [NAME: name\n, AGE: age\n, NOTES: random text\n, JOB: blabla, \n], [...]]

that I obtain with this code: 我用这段代码得到的：

list_of_lists = [list[x:x+4] for x in range(0, len(list),4)]

but my problem is that sometime the random text into NOTES: contains extra \\n and may cause the grouping to be wrong: 但我的问题是有时候random text进入NOTES:包含额外的\\n并且可能导致分组错误：

list=[[NAME: name\n, AGE: age\n, NOTES: unwanted\n, newlines\n], [that ruin\n, my plans\n, \n, NAME: name\n] etc etc]

So basically all lines are ok, the problem is the NOTES one where people inserted some carriage that I don't want because they make the text into NOTES to be split into different lines in the text and different items in the list: I want to delete \\n in order to have the NOTES field grouped in one line (in the text) and in one item (in the list) 所以基本上所有行都没问题，问题是NOTES一个人们插入了一些我不想要的马车，因为他们将文本分成NOTES，分成文本中的不同行和列表中的不同项：我想要删除\\ n以便将NOTES字段分组在一行（在文本中）和一个项目（在列表中）

EDIT: Thanks for helping! 编辑：谢谢你的帮助！ I have tried some your solutions but still didn't solve my problem... So I edited my question to explain better (edited content in bold). 我已经尝试了一些你的解决方案，但仍然没有解决我的问题......所以我编辑了我的问题以更好地解释（编辑的内容以粗体显示）。

Answer 1

I suggest doing things a little bit differently: 我建议做一些不同的事情：

result = []
d = {}
with open("file.txt") as f:
    for line in f:
        if line.startswith("NAME:"):
            if d:
                result.append(d)
            d = {}
        if any(line.startswith(key) for key in ("NAME:", "AGE:", "NOTES:")):
            key, value = line.strip().split(":", 1)
            d[key] = value
        else:
            d["NOTES"] += d["NOTES"] + line.strip()
    result.append(d)

This returns something like 这会返回类似的内容

[{'NOTES': ' random text random text', 'AGE': ' age', 'NAME': ' name'}, {'NOTES': ' random text random textother text. random text random textother text.', 'AGE': ' age', 'NAME': ' name'}]

Answer 2

Looks like this is intended to be a key value pair, Therefore first try splitting up the data into a list of dictionaries. 看起来这是一个键值对，因此首先尝试将数据拆分为字典列表。

You can reverse the text file string using text[::-1] and then do a replace reverse_text.split(':EMAN') then reversing again the strings within the lists. 您可以使用text[::-1]反转文本文件字符串，然后执行替换reverse_text.split(':EMAN')然后再次反转列表中的字符串。 This should give you a list ready for parsing into a dict looking like: 这应该给你一个准备好解析成dict的列表，如下所示：

list = [
    ['NAME: name\n AGE: age\n NOTES: random text\n\n'],
    ['NAME: name\n AGE: age\n NOTES: random text\n\n'],
    ...,
    ]

Answer 3

You may achieve it using list comprehension as: 您可以使用列表理解来实现它：

from StringIO import StringIO

myfile = StringIO("""NAME: name\n
AGE: age\n
NOTES: random text\n
\n
NAME: name\n
AGE: age\n
NOTES: random text\n
\n""")    # StringIO creates file like object

# You list comprehesion expression 
my_list = [["{}\n".format(item) for item in group.split("\n\n")+['']] for group in myfile.read().split("\n\n\n\n")]
#                               For adding extra `\n` at the end ^                              ^

where my_list will hold: my_list将持有的位置：

[['NAME: name\n', 'AGE: age\n', 'NOTES: random text\n', '\n'], ['NAME: name\n', 'AGE: age\n', 'NOTES: random text\n', '\n\n', '\n']]

In case you do not want \\n\\n as the second last element in the last sub-list, you may explicitly delete it as: 如果您不希望\\n\\n作为最后一个子列表中的倒数第二个元素，您可以将其明确删除为：

del my_list[-1][-2]

Now your my_list will hold the value: 现在你的my_list将保存值：

[['NAME: name\n', 'AGE: age\n', 'NOTES: random text\n', '\n'], ['NAME: name\n', 'AGE: age\n', 'NOTES: random text\n', '\n']]

Answer 4

import re

# some example text:
my_text = """NAME: name\nAGE: age\nNOTES: random text\n\nNAME: name\nAGE: age\nJOB: job\nNOTES: random text\n\nblah \n\n blah\n\nNAME: name\nAGE: age\nNOTES: more \n random\n text\n\n""" 
# splitting up your text into a list of lists:
my_list = [[c.group(1) for c in re.finditer('(?ms)(?=(^[A-Z]+:.*?)(^[A-Z]+:|\Z))',chunk.group(1))] for chunk in re.finditer('(?ms)(?=(^NAME:.*?)(^NAME:|\Z))', my_text)]

This works by performing two regex searches. 这通过执行两个正则表达式搜索来工作。 The first one finds all the text starting from NAME: up until right before the next NAME: or the end of the file. 第一个查找从NAME:开始的所有文本NAME:直到下一个NAME:之前NAME:或文件的结尾。 This essentially splits the text into your data for each person. 这基本上将文本分成每个人的数据。 Then, an almost identical regex is used to split each of these into lists of each attribute ( NAME , AGE , JOB , etc.). 然后，使用几乎相同的正则表达式将每个正则表达式分成每个属性的列表（ NAME ， AGE ， JOB等）。 This regex assumes that each attribute label is in all caps, occurs at the beginning of a line, and is followed by a : . 这个正则表达式假定每个属性标签都是全部大写，出现在一行的开头，后跟一个: 。

The contents of my_list in the example above is: 上例中my_list的内容是：

[['NAME: name\n', 'AGE: age\n', 'NOTES: random text\n\n'],
 ['NAME: name\n',
  'AGE: age\n',
  'JOB: job\n',
  'NOTES: random text\n\nblah \n\n blah\n\n'],
 ['NAME: name\n', 'AGE: age\n', 'NOTES: more \n random\n text\n\n']]

仅从文本文件中的某些行中删除“\\ n”

问题描述

4 个解决方案

解决方案1
0 2016-11-20 17:46:13

解决方案2
0 2016-11-20 17:52:22

解决方案3
0 2016-11-20 17:55:10

解决方案4
0 已采纳 2016-11-20 18:24:21

仅从文本文件中的某些行中删除“\\ n”

问题描述

4 个解决方案

解决方案1 0 2016-11-20 17:46:13

解决方案2 0 2016-11-20 17:52:22

解决方案3 0 2016-11-20 17:55:10

解决方案4 0 已采纳 2016-11-20 18:24:21

解决方案1
0 2016-11-20 17:46:13

解决方案2
0 2016-11-20 17:52:22

解决方案3
0 2016-11-20 17:55:10

解决方案4
0 已采纳 2016-11-20 18:24:21