简体   繁体   English

使用 Windows 批处理从文件中删除尾随空格?

[英]Remove trailing spaces from a file using Windows batch?

如何使用 Windows 命令提示符从文本文件中修剪所有尾随空格?

The DosTips RTRIM function that Ben Hocking cites can be used to create a script that can right trim each line in a text file. Ben Hocking 引用的 DosTips RTRIM 函数可用于创建一个脚本,该脚本可以正确修剪文本文件中的每一行。 However, the function is relatively slow.但是,该功能相对较慢。

DosTips user (and moderator) aGerman developed a very efficient right trim algorithm . DosTips 用户(和版主)aGerman 开发了一种非常有效的正确修剪算法 He implemented the algorithm as a batch "macro" - an interesting concept of storing complex mini scripts in environment variables that can be executed from memory.他将该算法实现为批处理“宏”——一个将复杂的迷你脚本存储在可以从内存中执行的环境变量中的有趣概念。 The macros with arguments are a major discussion topic in and of themselves that is not relevent to this question.带参数的宏本身就是一个主要的讨论话题,与这个问题无关。

I have extracted aGerman's algorithm and put it in the following batch script.我已经提取了 aGerman 的算法并将其放入以下批处理脚本中。 The script expects the name of a text file as the only parameter and proceeds to right trim the spaces off each line in the file.该脚本期望文本文件的名称作为唯一参数,并继续正确修剪文件中每一行的空格。

@echo off
setlocal enableDelayedExpansion
set "spcs= "
for /l %%n in (1 1 12) do set "spcs=!spcs!!spcs!"
findstr /n "^" "%~1" >"%~1.tmp"
setlocal disableDelayedExpansion
(
  for /f "usebackq delims=" %%L in ("%~1.tmp") do (
    set "ln=%%L"
    setlocal enableDelayedExpansion
    set "ln=!ln:*:=!"
    set /a "n=4096"
    for /l %%i in (1 1 13) do (
      if defined ln for %%n in (!n!) do (
        if "!ln:~-%%n!"=="!spcs:~-%%n!" set "ln=!ln:~0,-%%n!"
        set /a "n/=2"
      )
    )
    echo(!ln!
    endlocal
  )
) >"%~1"
del "%~1.tmp" 2>nul

Assuming the script is called rtrimFile.bat, then it can be called from the command line as follows:假设脚本名为rtrimFile.bat,那么可以从命令行调用它,如下所示:

rtrimFile "fileName.txt"

A note about performance关于性能的说明
The original DosTips rtrim function performs a linear search and defaults to trimming a maximum of 32 spaces.原始 DosTips rtrim 函数执行线性搜索,默认最多修剪 32 个空格。 It has to iterate once per space.每个空间必须迭代一次。

aGerman's algorithm uses a binary search and it is able to trim the maximum string size allowed by batch (up to ~8k spaces) in 13 iterations. aGerman 的算法使用二分搜索,它能够在 13 次迭代中修整批处理允许的最大字符串大小(最多约 8k 个空格)。

Unfotunately, batch is very SLOW when it comes to processing text.不幸的是,批处理在处理文本时非常慢。 Even with the efficient rtrim function, it takes ~70 seconds to trim a 1MB file on my machine.即使使用高效的 rtrim 功能,在我的机器上修剪一个 1MB 的文件也需要大约 70 秒。 The problem is, just reading and writing the file without any modification takes significant time.问题是,在没有任何修改的情况下读取和写入文件需要大量时间。 This answer uses a FOR loop to read the file, coupled with FINDSTR to prefix each line with the line number so that blank lines are preserved.此答案使用 FOR 循环来读取文件,并结合 FINDSTR 为每行添加行号前缀,以便保留空行。 It toggles delayed expansion to prevent !它切换延迟扩展以防止! from being corrupted, and uses a search and replace operation to remove the line number prefix from each line.被破坏,并使用搜索和替换操作从每行中删除行号前缀。 All that before it even begins to do the rtrim.所有这一切甚至在它开始进行 rtrim 之前。

Performance could be nearly doubled by using an alternate file read mechanism that uses set /p .通过使用使用set /p替代文件读取机制,性能几乎可以翻倍。 However, the set /p method is limited to ~1k bytes per line, and it strips trailing control characters from each line.但是, set /p 方法限制为每行约 1k 字节,并且它会从每行中去除尾随控制字符。

If you need to regularly trim large files, then even a doubling of performance is probably not adequate.如果您需要定期修剪大文件,那么即使性能提高一倍也可能不够。 Time to download (if possible) any one of many utilities that could process the file in the blink of an eye.是时候下载(如果可能)可以在眨眼间处理文件的许多实用程序中的任何一个。

If you can't use non-native software, then you can try VBScript or JScript excecuted via the CSCRIPT batch command.如果您不能使用非本地软件,那么您可以尝试通过 CSCRIPT 批处理命令执行 VBScript 或 JScript。 Either one would be MUCH faster.任何一个都会快得多。

UPDATE - Fast solution with JREPL.BAT更新 - 使用 JREPL.BAT 的快速解决方案

JREPL.BAT is a regular expression find/replace utility that can very efficiently solve the problem. JREPL.BAT是一个正则表达式查找/替换实用程序,可以非常有效地解决问题。 It is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward.它是纯脚本(混合批处理/JScript),可以在 XP 以后的任何 Windows 机器上本地运行。 No 3rd party exe files are needed.不需要第 3 方 exe 文件。

With JREPL.BAT somewhere within your PATH, you can strip trailing spaces from file "test.txt" with this simple command:使用 JREPL.BAT 在 PATH 中的某处,您可以使用以下简单命令从文件“test.txt”中去除尾随空格:

jrepl " +$" "" /f test.txt /o -

If you put the command within a batch script, then you must precede the command with CALL:如果将命令放在批处理脚本中,则必须在命令之前使用 CALL:

call jrepl " +$" "" /f test.txt /o -

Go get yourself a copy of CygWin or the sed package from GnuWin32 .GnuWin32获取CygWinsed副本

Then use that with the command:然后将其与命令一起使用:

sed "s/ *$//" inputFile >outputFile

Dos Tips has an implementation of RTrim that works for batch files: Dos Tips 有一个适用于批处理文件的 RTrim 实现

:rTrim string char max -- strips white spaces (or other characters) from the end of a string
::                     -- string [in,out] - string variable to be trimmed
::                     -- char   [in,opt] - character to be trimmed, default is space
::                     -- max    [in,opt] - maximum number of characters to be trimmed from the end, default is 32
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
SETLOCAL ENABLEDELAYEDEXPANSION
call set string=%%%~1%%
set char=%~2
set max=%~3
if "%char%"=="" set char= &rem one space
if "%max%"=="" set max=32
for /l %%a in (1,1,%max%) do if "!string:~-1!"=="%char%" set string=!string:~0,-1!
( ENDLOCAL & REM RETURN VALUES
    IF "%~1" NEQ "" SET %~1=%string%
)
EXIT /b

If you're not used to using functions in batch files, read this .如果您不习惯在批处理文件中使用函数,请阅读此

There is a nice trick to remove trailing spaces based on this answer of user Aacini ;根据用户Aacini 的这个答案,有一个很好的技巧可以删除尾随空格; I modified it so that all other spaces occurring in the string are preserved.我修改了它,以便保留字符串中出现的所有其他空格。 So here is the code:所以这里是代码:

@echo off
setlocal EnableDelayedExpansion

rem // This is the input string:
set "x=  This is   a text  string     containing  many   spaces.   "

rem // Ensure there is at least one trailing space; then initialise auxiliary variables:
set "y=%x% " & set "wd=" & set "sp="

rem // Now here is the algorithm:
set "y=%y: =" & (if defined wd (set "y=!y!!sp!!wd!" & set "sp= ") else (set "sp=!sp! ")) & set "wd=%"

rem // Return messages:
echo  input: "%x%"
echo output: "%y%"

endlocal

However, this approach fails when a character of the set ^ , !但是,当集合^! , " occurs in the string. , "出现在字符串中。

删除 Windows 文件中尾随空格的好工具: http : //mountwhite.net/en/spaces.html

I just found a very nice solution for trimming off white-spaces of a string:我刚刚找到了一个非常好的解决方案来修剪字符串的空格:
Have you ever called a sub-routine using call and expanded all arguments using %* ?您是否曾经使用call子例程并使用%*扩展所有参数? You will notice that any leading and/or trailing white-spaces are removed.您会注意到任何前导和/或尾随空格都被删除了。 Any white-spaces occurring in between other characters are preserved;保留其他字符之间出现的任何空格; so are all the other command token separators , , ;所有其他命令标记分隔符, , ; , = and also the non-break space (character code 0xFF ). , =以及不间断空格(字符代码0xFF )。 This effect I am going to utilise for my script:我将在我的脚本中使用这种效果:

@echo off

set "STR="
set /P STR="Enter string: "

rem /* Enable Delayed Expansion to avoid trouble with
rem    special characters: `&`, `<`, `>`, `|`, `^` */
setlocal EnableDelayedExpansion
echo You entered: `!STR!`
call :TRIM !STR!
echo And trimmed: `!RES!`
endlocal

exit /B

:TRIM
set "RES=%*"
exit /B

This script expects a string entered by the user which is then trimmed.此脚本需要用户输入的字符串,然后对其进行修剪。 This can of course also be applied on lines of a file (which the original question is about, but reading such line by line using for /F is shown in other answers anyway, so I skip this herein).这当然也可以应用于文件的行(原始问题是关于它的,但是无论如何在其他答案中都显示了使用for /F逐行读取这样的行,所以我在这里跳过这个)。 To trim the string on one side only, add a single character to the opposite side prior to trimming and remove it afterwards.要仅在一侧修剪字符串,请在修剪之前在另一侧添加一个字符,然后将其删除。

This approach has got some limitations though: it does not handle characters % , !但是这种方法有一些限制:它不处理字符% , ! , ^ and " properly. To overcome this, several intermediate string manipulation operations become required: , ^"正确。为了克服这个问题,需要几个中间字符串操作操作:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

set "STR="
set /P STR="Enter string: "

setlocal EnableDelayedExpansion
echo You entered: `!STR!`
set "STR=!STR:%%=%%%%!"
set "STR=!STR:"=""!^"
if not "%STR%"=="%STR:!=%" set "STR=!STR:^=^^^^!"
set "STR=%STR:!=^^^!%"
call :TRIM !STR!
set "RES=!RES:""="!^"
echo And trimmed: `!RES!`
endlocal

endlocal
exit /B

:TRIM
set "RES=%*"
exit /B

Update更新

Both of the above scripts cannot handle the characters & , < , > and |上述两个脚本都不能处理字符&<>| , because call seems to become aborted as soon as such a character appears in an unquoted and unescaped manner. ,因为一旦这样的字符以未加引号和未转义的方式出现, call似乎就会中止。

However, I finally found a way to fix that and come up with an approach that can successfully deal with all characters (except perhaps some control characters, which I did not test):但是,我终于找到了解决这个问题的方法,并提出了一种可以成功处理所有字符的方法(可能除了一些控制字符,我没有测试过):

@echo off
setlocal EnableExtensions EnableDelayedExpansion

rem // The last white-space in `STRING` is a tabulator:
set "RESULT=" & set "STRING=   (<&>"^|)^^!^^^^;,=   ^"
echo Input string: `!STRING!`

rem // Double quotes to avoid troubles with unbalanced ones:
if defined STRING set "STRING=!STRING:"=""!^"
rem // Particularly handle carets and exclamation marks as delayed expansion is enabled:
if defined STRING set "STRING=!STRING:^=^^^^!"
if defined STRING set "STRING=%STRING:!=^^^!%" !
if defined STRING (
    rem // Escape all characters that `call` has got troubles with:
    set "STRING=!STRING:^=^^!"
    set "STRING=!STRING:&=^&!"
    set "STRING=!STRING:<=^<!"
    set "STRING=!STRING:>=^>!"
    set "STRING=!STRING:|=^|!"
)
rem /* Call the sub-routine here; the strigs `!=!` constitute undefined dummy variables
rem    with an illegal name, which eventually become removed; the purpose of them us to
rem    enable usage of that `call` inside of a `for` loop with the meta-variable `%%S`,
rem    which would otherwise become unintentionally expanded rather than `%%STRING%%`,
rem    which literally contained `%%S`; the `!=!` at the end is just there in case you
rem    want to append another string that could also match another `for` meta-variable;
rem    note that `!!` is not possible as this would be collapsed to a single `!`, so
rem    a (most probably undefined) variable `!STRING%!` would then become expanded: */
call :TRIM %%!=!STRING%%!=!
rem /* The caret doubling done by `call` does not need to be reverted, because due to
rem    doubling of the quotes carets appear unquoted, so implicit reversion occurs here;
rem    of course the doubling of the quotes must eventually be undone: */
if defined RESULT set "RESULT=!RESULT:""="!^"
echo Now trimmed: `!RESULT!`

endlocal
exit /B

:TRIM
    rem // This is the effective line that does the left- and right-trimming:
    set "RESULT=%*" !
    exit /B

I use this Python 2 script to print lines with trailing whitespace and remove them manually:我使用这个 Python 2 脚本打印带有尾随空格的行并手动删除它们:

#!/usr/bin/env python2
import sys

if not sys.argv[1:]:
  sys.exit('usage: whitespace.py <filename>')

for no, line in enumerate(open(sys.argv[1], 'rb').read().splitlines()):
  if line.endswith(' '):
    print no+1, line

I know that Python is not preinstalled for Windows, but at least it works cross-platform.我知道 Python 没有为 Windows 预装,但至少它可以跨平台工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM