简体   繁体   English

sed / awk-删除文件名中的空间

[英]sed / awk - remove space in file name

I'm trying to remove whitespace in file names and replace them. 我正在尝试删除文件名中的空格并替换它们。

Input: 输入:

echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'

However the output 但是输出

File__Name1.xml__File__Name3__report.xml

Desired output 所需的输出

File__Name1.xml File__Name3__report.xml

You named awk in the title of the question, didn't you? 您在问题的标题中命名了awk ,不是吗?

$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$
  • -F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces -F'.xml *'指示awk在正则表达式上拆分,所请求的扩展名加上0或多个空格
  • the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account... 对输入行被分割的所有字段执行循环{for(i=1;i<=NF;i++) –请注意,最后一个字段为空(它紧随最后一个字段扩展名),但我们将考虑到这一点...
    the body of the loop 循环的主体
    • gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i gsub(" ","_", $i)将所有出现的空格替换为当前字段中的下划线,由循环变量i索引
    • printf i<NF?$i ".xml ":"\\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF , we just want to terminate the output line with a newline. printf i<NF?$i ".xml ":"\\n"输出不同的内容,如果i<NF是一个常规字段,那么我们附加扩展名和一个空格,否则i等于NF ,我们只想终止输出用换行符。

It's not perfect, it appends a space after the last filename. 这不是完美的方法,它在最后一个文件名后附加一个空格。 I hope that's good enough... 我希望这足够好...


▶ A D D E N D U M ◀ ▶A D D E N D U M◀

I'd like to address: 我想解决:

To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u 为了实现这些目标,我决定将scriptlet包装在shell函数中,即将空格更改为下划线称为s2u

$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$

It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains. 这有点不同(更好?),因为它不特殊打印最后一个字段,而是对附加到每个字段的定界符进行特殊处理,但是在扩展名上拆分的想法仍然存在。

This seems a good start if the filenames aren't delineated: 如果未描述文件名,这似乎是一个好的开始:

((?:\S.*?)?\.\w{1,})\b

(        // start of captured group
(?:      // non-captured group
\S.*?    // a non-white-space character, then 0 or more any character
)?       // 0 or 1 times
\.       // a dot
\w{1,}   // 1 or more word characters
)        // end of captured group
\b       // a word boundary

You'll have to look-up how a PCRE pattern converts to a shell pattern. 您将必须查看PCRE模式如何转换为外壳模式。 Alternatively it can be run from a Python/Perl/PHP script. 另外,它可以从Python / Perl / PHP脚本运行。

Demo 演示版

Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. 假设您要询问如何重命名文件名,而不是删除由于其他原因而使用的文件名列表中的空格,那么这是很长的路要走。 The long way uses sed. 长途使用sed。 The short way uses rename. 简短的方法使用重命名。 If you are not trying to rename files, your question is quite unclear and should be revised. 如果您不尝试重命名文件,则您的问题尚不清楚,应予以修正。

If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that. 如果目标是简单地获取xml文件名列表并使用sed进行更改,那么下面的示例就是如何做到这一点。

directory contents: 目录内容:

ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml

cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml

# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   # I prefer 'rename' for such things
   # rename 's/[[:space:]]/_/g' "${a_glob[i]}";
   # but sed works, can't see any reason to use it for this purpose though
   mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob

result: 结果:

ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

globbing is what you want here because of the spaces in the names. 由于名称中存在空格,因此您在此处需要遍历。

However, this is really a complicated solution, when actually all you need to do is: 但是,这实际上是一个复杂的解决方案,而实际上您需要做的只是:

cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml

and that's it, you're done. 就是这样,您就完成了。

If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name. 另一方面,如果您尝试创建文件名列表,则肯定希望使用globlob方法,如果您仅修改语句,那么该方法也将在那里执行您想要的操作,也就是说,只需使用sed来更改输出文件名。

If your goal is to change the filenames for output purposes, and not rename the actual files: 如果您的目标是更改文件名以用于输出,而不是重命名实际文件:

cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

You could use rename : 您可以使用rename

rename --nows *.xml

This will replace all the spaces of the xml files in the current folder with _ . 这将用_替换当前文件夹中xml文件的所有空格。

Sometimes it comes without the --nows option, so you can then use a search and replace: 有时它没有--nows选项,因此您可以使用搜索并替换:

rename 's/[[:space:]]/__/g' *.xml

Eventually you can use --dry-run if you want to just print filenames without editing the names. 最终,如果只想打印文件名而不编辑名称,则可以使用--dry-run

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM