简体   繁体   English

标准化字符串中的行尾的最Python方式是什么?

[英]What's the most pythonic way of normalizing lineends in a string?

Given a text-string of unknown source, how does one best rewrite it to have a known lineend-convention? 给定一个未知来源的文本字符串,如何最好地重写它以具有已知的行尾约定?

I usually do: 我通常这样做:

lines = text.splitlines()
text = '\n'.join(lines)

... but this doesn't handle "mixed" text-files of utterly confused conventions (Yes, they still exist!). ...但这不能处理完全混淆的约定的“混合”文本文件(是的,它们仍然存在!)。

Edit 编辑

The oneliner of what I'm doing is of course: 当然,我正在做的事情是:

'\n'.join(text.splitlines())

... that's not what I'm asking about. ...这不是我要问的。

The total number of lines should be the same afterwards, so no stripping of empty lines. 之后,总行数应相同,因此不要剥离空行。

Testcases 测试用例

Splitting 分裂

'a\nb\n\nc\nd'
'a\r\nb\r\n\r\nc\r\nd'
'a\rb\r\rc\rd'
'a\rb\n\rc\rd'
'a\rb\r\nc\nd'
'a\nb\r\nc\rd'

.. should all result in 5 lines. ..应该全部导致5行。 In a mixed context, splitlines assumes that '\\r\\n' is a single logical newline, leading to 4 lines for the last two testcases. 在混合上下文中,分割线假定'\\ r \\ n'是单个逻辑换行符,最后两个测试用例导致4行。

Hm, a mixed context that contains '\\r\\n' can be detected by comparing the result of splitlines() and split('\\n'), and/or split('\\r')... 嗯,可以通过比较splitlines()和split('\\ n')和/或split('\\ r')的结果来检测包含'\\ r \\ n'的混合上下文...

mixed.replace('\r\n', '\n').replace('\r', '\n')

应该处理所有可能的变体。

... but this doesn't handle "mixed" text-files of utterly confused conventions (Yes, they still exist!) ...但是这不能处理完全混淆的约定的“混合”文本文件(是的,它们仍然存在!)

Actually it should work fine: 实际上,它应该可以正常工作:

>>> s = 'hello world\nline 1\r\nline 2'

>>> s.splitlines()
['hello world', 'line 1', 'line 2']

>>> '\n'.join(s.splitlines())
'hello world\nline 1\nline 2'

What version of Python are you using? 您正在使用哪个版本的Python?

EDIT: I still don't see how splitlines() is not working for you: 编辑:我仍然看不到splitlines()对您不起作用:

>>> s = '''\
... First line, with LF\n\
... Second line, with CR\r\
... Third line, with CRLF\r\n\
... Two blank lines with LFs\n\
... \n\
... \n\
... Two blank lines with CRs\r\
... \r\
... \r\
... Two blank lines with CRLFs\r\n\
... \r\n\
... \r\n\
... Three blank lines with a jumble of things:\r\n\
... \r\
... \r\n\
... \n\
... End without a newline.'''

>>> s.splitlines()
['First line, with LF', 'Second line, with CR', 'Third line, with CRLF', 'Two blank lines with LFs', '', '', 'Two blank lines with CRs', '', '', 'Two blank lines with CRLFs', '', '', 'Three blank lines with a jumble of things:', '', '', '', 'End without a newline.']

>>> print '\n'.join(s.splitlines())
First line, with LF
Second line, with CR
Third line, with CRLF
Two blank lines with LFs


Two blank lines with CRs


Two blank lines with CRLFs


Three blank lines with a jumble of things:



End without a newline.

As far as I know splitlines() doesn't split the list twice or anything. 据我所知, splitlines()不会两次拆分列表。

Can you paste a sample of the kind of input that's giving you trouble? 您可以粘贴给您带来麻烦的那种输入示例吗?

Are there even more convetions than \\r\\n\\ and \\n ? 还有比\\r\\n\\\\n更多的惊喜吗? Simply replacing \\r\\n is enough if you dont want lines. 如果您不需要行,只需替换\\r\\n就足够了。

only_newlines = mixed.replace('\r\n','\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从字符串的开头删除数字的最Pythonic方法是什么? - What's the most Pythonic way to remove a number from start of a string? 确定字节序的最Pythonic方法是什么? - What's the most Pythonic way of determining endianness? 构建此词典的最pythonic方法是什么? - What's the most pythonic way to build this dictionary? 在具有多种类型的空白字符的字符串中的每个单词上应用函数的最有效的方法是什么? - What's the most pythonic way to apply a function on every word in a string with multiple types of white space characters? 重构此列表构建代码的最Pythonic方法是什么? - What's the most Pythonic way to refactor this list building code? 什么是避免在字符串中指定相同值的最pythonic方法 - What is the most pythonic way to avoid specifying the same value in a string Python 2:使用字符串解释进行枚举的最优雅/ pythonic方式是什么? - Python 2: What is the most elegant/pythonic way of doing a enum with string interpretations? 什么是确保列表中所有元素不同的最pythonic方法? - What's the most pythonic way to ensure that all elements of a list are different? 处理嵌套字典列表的最 Pythonic 方法是什么? - What's the most Pythonic way to deal with a list of nested dictionaries? 重新排列字符串中字母的最有效方式是什么? - What is the most pythonic way to re-order letters in a string?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM