简体   繁体   English

在遇到换行符或逗号时拆分/剥离一组行

[英]Split/Strip a set of lines on encountering newline or comma

I have a set of lines in a textpad.我在 textpad 中有一组行。

Eg:例如:

643 ABCF aksdjgk 1q25hgn
239056 dsgkn 32968, 39859 ewktgklh, 35927369
9689846 dklsghdkls 23-608 dsklgnk
ewth834056 sidtguoi,235907 sdkgji,25689-8, 29067490,wtyuoew

How can I read this using python and have the text split into different list values on newline as well as , (comma)?如何使用 python 读取此内容并将文本拆分为换行符以及, (逗号)上的不同列表值?

For instance, the output for the example text should come out as例如,示例文本的输出应为

643 ABCF aksdjgk 1q25hgn
239056 dsgkn 32968,
39859 ewktgklh,
35927369
9689846 dklsghdkls 23-608 dsklgnk
ewth834056 sidtguoi,
235907 sdkgji,
25689-8,
29067490,
wtyuoew

Try using re.sub , and replace all commas with comma followed by a newline:尝试使用re.sub ,并用逗号后跟换行符替换所有逗号:

result = re.sub(',\s*', ',\n', input)

Note that we actually match ,\\s* , to remove any whitespace which might occur after a comma separator.请注意,我们实际上匹配,\\s* ,以删除逗号分隔符后可能出现的任何空格。

Assuming that "textpad" means text file, you have a couple of options.假设“textpad”表示文本文件,您有几个选择。 For a small file like the one shown, the easiest solution would be to read in the entire file as a string, and replace the commas with a comma + newline, as @TimBiegeleisen's answer shows.对于如图所示的小文件,最简单的解决方案是将整个文件作为字符串read ,并用逗号 + 换行符替换逗号,如@TimBiegeleisen 的回答所示。

For larger files, this may not be a good option due to memory constraints.对于较大的文件,由于内存限制,这可能不是一个好的选择。 In that case, and for the sake of generality, I like to iterate over the lines of a file.在这种情况下,为了一般性,我喜欢遍历文件的行。 Here is a fairly simple generator that behaves like a normal file iterator, but also splits on commas:这是一个相当简单的生成器,它的行为类似于普通的文件迭代器,但也以逗号分隔:

from itertools import zip_longest, repeat
import re

def spliterator(file):
    for line in file:
        segments = re.split(r',\s*', line)
        ends = repeat(',\n', len(segments) - 1)
        for item in zip_longest(segments, ends, fillvalue=''):
            yield ''.join(item)

It would be pretty simple to make this accept the split pattern as an argument, optionally keep the trailing spaces, and return the whole line with newline characters inserted.让它接受拆分模式作为参数非常简单,可以选择保留尾随空格,并返回插入换行符的整行。

Using the generator is simple, since it just wraps a normal file object or any other iterable of lines:使用生成器很简单,因为它只是包装了一个普通的文件对象或任何其他可迭代的行:

with open('textpad.txt') as file:
    print(''.join(spliterator(file)))

Here is an IDEOne link with a demo.这是带有演示的IDEOne 链接

To get the contents of the whole file as though read in by readlines , just wrap in list :要获取整个文件的内容就像被readlines读入一样,只需将其包裹在list

lines = list(spliterator(file))

To write back to an open output file, use writelines directly:要写回打开的输出文件, writelines直接使用writelines

output.writelines(spliterator(file))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM