简体   繁体   English

删除Linux文件中的特殊字符

[英]Remove special characters in linux files

I have a lot of files *.java, *.xml. 我有很多文件* .java,*。xml。 But a guy wrote some comments and Strings with spanish characters. 但是一个人用西班牙语字符写了一些注释和字符串。 I been searching on the web how to remove them. 我一直在网上搜索如何删除它们。

I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java 我试图find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders? find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java例如,如何从子文件夹中的许多其他文件中删除这些字符?

Why are you trying to remove only characters with diacritic signs? 您为什么要尝试只删除带有变音符号的字符? It probably worth removing all characters with codes not in the range 0-127 , so removal regexp will be s/[\\0x80-\\0xFF]//g if you're sure that your files should not contain higher ascii. 可能值得删除代码不在0-127范围内的所有字符,因此,如果您确定文件中不应包含较高的ascii,则删除regexp将为s/[\\0x80-\\0xFF]//g

If that's what you really want, you can use find , almost as you are using it. 如果这是您真正想要的,则几乎可以在使用它时使用find

find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'

The differences: 区别:

  • The path . 路径. is implicit if no path is supplied. 如果没有提供路径,则为隐式。
  • This command only operates on *.java and *.xml files. 该命令仅对* .java和* .xml文件起作用。
  • execdir is more secure than exec (read the man page). execdirexec更安全(请阅读手册页)。
  • -i tells sed to modify the file argument in place. -i告诉sed修改文件参数。 Read the man page to see how to use it to make a backup. 阅读手册页以了解如何使用它进行备份。
  • {} represents a path argument which find will substitute in. {}表示一个路径参数, find将替代该参数。
  • The ; ; is part of the find syntax for exec / execdir . exec / execdirfind语法的execdir

You're almost there :) 你快到 :)

find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
                         ^^                 ^^

From sed(1) : sed(1)

   -i[SUFFIX], --in-place[=SUFFIX]
          edit files in place (makes backup if extension supplied)

From find(1) : find(1)

   -exec command ;
          Execute command; true if 0 status is returned.  All
          following arguments to find are taken to be arguments to
          the command until an argument consisting of `;' is
          encountered.  The string `{}' is replaced by the current
          file name being processed everywhere it occurs in the
          arguments to the command, not just in arguments where it
          is alone, as in some versions of find.  Both of these
          constructions might need to be escaped (with a `\') or
          quoted to protect them from expansion by the shell.  See
          the EXAMPLES section for examples of the use of the -exec
          option.  The specified command is run once for each
          matched file.  The command is executed in the starting
          directory.   There are unavoidable security problems
          surrounding use of the -exec action; you should use the
          -execdir option instead.

tr is the tool for the job: tr是完成这项工作的工具:

NAME
       tr - translate or delete characters

SYNOPSIS
       tr [OPTION]... SET1 [SET2]

DESCRIPTION
       Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
       put.

       -c, -C, --complement
              use the complement of SET1

       -d, --delete
              delete characters in SET1, do not translate

       -s, --squeeze-repeats
              replace each input sequence of a repeated character that is listed in SET1  with  a
              single occurrence of that character

piping your input through tr -d áíéóúñ will probably do what you want. 通过tr -d áíéóúñ输入的内容可能会满足您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM