简体   繁体   English

检测换行类型

[英]Detecting type of line breaks

What would be the most efficient (fast and reliable enough) way in JavaScript to determine the type of line breaks used in a text - Unix vs Windows.在 JavaScript 中确定文本中使用的换行符类型的最有效(足够快速和可靠)的方法是什么 - Unix 与 Windows。

In my Node app I have to read in large utf-8 text files and then process them based on whether they use Unix or Windows line breaks.在我的 Node 应用程序中,我必须读取大型 utf-8 文本文件,然后根据它们是使用 Unix 还是 Windows 换行符来处理它们。

When the type of line breaks comes up as uncertain, I want to conclude based on which one it is most likely then.当换行符的类型不确定时,我想根据最有可能的换行符来得出结论。

UPDATE更新

As per my own answer below, the code I ended up using .根据我自己在下面的回答, 我最终使用的代码是.

Thank @Sam-Graham .感谢@Sam-Graham I tried to produce an optimized way .我试图产生一种优化的方式 Also, the output of the function is directly usable (see below example):此外,该函数的输出可直接使用(参见下面的示例):

function getLineBreakChar(string) {
    const indexOfLF = string.indexOf('\n', 1)  // No need to check first-character
    
    if (indexOfLF === -1) {
        if (string.indexOf('\r') !== -1) return '\r'
        
        return '\n'
    }
    
    if (string[indexOfLF - 1] === '\r') return '\r\n'
    
    return '\n'
}

Note1: Supposed string is healthy (only contains one type of line-breaks).注 1:假定string是健康的(仅包含一种类型的换行符)。

Note2: Supposed you want LF to be default encoding (when no line-break found).注 2:假设您希望LF为默认编码(未找到换行符时)。


Usage example:使用示例:

fs.writeFileSync(filePath,
        string.substring(0, a) +
        getLineBreakChar(string) +
        string.substring(b)
);

This utility may be useful too:这个实用程序也可能有用:

const getLineBreakName = (lineBreakChar) =>
    lineBreakChar === '\n' ? 'LF' : lineBreakChar === '\r' ? 'CR' : 'CRLF'

You would want to look first for an LF.你会想先寻找 LF。 like source.indexOf('\n') and then see if the character behind it is a CR like source[source.indexOf('\n')-1] === '\r' . like source.indexOf('\n')然后看看它后面的字符是不是像source[source.indexOf('\n')-1] === '\r'这样的 CR。 This way, you just find the first example of a newline and match to it.这样,您只需找到换行符的第一个示例并与之匹配。 In summary,总之,

function whichLineEnding(source) {
     var temp = source.indexOf('\n');
     if (source[temp - 1] === '\r')
         return 'CRLF'
     return 'LF'
}

There are two popularish examples of libraries doing this in the npm modules: node-newline and crlf-helper The first does a split on the entire string which is very inefficient in your case.在 npm 模块中有两个流行的库示例: node-newlinecrlf-helper第一个对整个字符串进行拆分,这在您的情况下效率非常低。 The second uses a regex which in your case would not be quick enough.第二个使用正则表达式,在您的情况下它不够快。

However, from your edit, if you want to determine which is more plentiful.但是,根据您的编辑,如果您想确定哪个更丰富。 Then I would use the code from node-newline as it does handle that case.然后我会使用来自node-newline的代码,因为它确实处理了这种情况。

In the end I used my own solution for this, based on simple statistics:最后,基于简单的统计数据,我为此使用了自己的解决方案:

const {EOL} = require('os');

function getEOL(text) {
    const m = text.match(/\r\n|\n/g);
    const u = m && m.filter(a => a === '\n').length;
    const w = m && m.length - u;
    if (u === w) {
        return EOL; // use the OS default
    }
    return u > w ? '\n' : '\r\n';
}

When there are no line breaks, or their number suddenly equal, it will return the OS's default EOL.当没有换行符,或者它们的数量突然相等时,它将返回操作系统的默认 EOL。

UPDATE更新

Later on I found out through further practice, that if you want to process text in the same way, regardless of whether it has Unix or Windows encoding, then the most efficient approach is to simply replace any possible Windows encoding with the Unix one, and not bother with any verification at all:后来通过进一步实践发现,如果你想用同样的方式处理文本,不管它是Unix还是Windows编码,那么最有效的方法就是简单地将任何可能的Windows编码替换为Unix编码,并且根本不用理会任何验证:

text = text.replace(/\r\n/g, '\n'); // replace every \r\n with \n

This is how we detect line endings in JavaScript files using ESLint rule.这就是我们如何使用 ESLint 规则检测 JavaScript 文件中的行尾。 Source means the actual file content.源表示实际的文件内容。

Note: Sometimes you can have files with mixed line-endings also.注意:有时您也可以拥有混合行尾的文件。

https://github.com/eslint/eslint/blob/master/lib/rules/linebreak-style.js https://github.com/eslint/eslint/blob/master/lib/rules/linebreak-style.js

Try this试试这个

if(text.search(/\r/) > -1 || text.search(/\r\n/) > -1){
   alert('Windows');
} else if(text.search(/\n/) > -1){
   alert('Unix');
} else {
   alert('No line breaks found')
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM