I would like to correct a bad encoding for thousand files. The error is always the same, an unknown char should be replaced with a french é
.
$ find . -type f | grep 127427
./documents/1778_commande_127427_accus�_de_r�ception.pdf
$ find . -type f | grep 127427 | hexdump -C
00000000 2e 2f 64 6f 63 75 6d 65 6e 74 73 2f 31 37 37 38 |./documents/1778|
00000010 5f 63 6f 6d 6d 61 6e 64 65 5f 31 32 37 34 32 37 |_commande_127427|
00000020 5f 61 63 63 75 73 ef bf bd 5f 64 65 5f 72 ef bf |_accus..._de_r..|
00000030 bd 63 65 70 74 69 6f 6e 2e 70 64 66 0a |.ception.pdf.|
0000003d
So I am looking for ef bf bd
which does not look like an unicode char. Unfortunately looking for the 0xef
does not work:
$ find . -type f | grep -P '\xef'
(nothing)
Any clues?
Next I am planning to do something like:
$ find . -type f | grep <magic-here> | xargs -n1 -I{} sh -c 'mv "{}" $(echo "{}" | sed s/<magic-here>/é/) '
Like this:
echo $'\x2e\x2f\x64\x6f\x63\x75\x6d\x65\x6e\x74\x73\x2f\x31\x37\x37\x38\x5f\x63\x6f\x6d\x6d\x61\x6e\x64\x65\x5f\x31\x32\x37\x34\x32\x37\x5f\x61\x63\x63\x75\x73\xef\xbf\xbd\x5f\x64\x65\x5f\x72\xef\xbf\xbd\x63\x65\x70\x74\x69\x6f\x6e\x2e\x70\x64\x66\x0a'\
| grep -Fa $'\xef\xbf\xbd'
-a
treats binary files as text. -F
performs a fixed string search, no regular expressions. $''
is an ANSI string
The find command should look like this:
find ... -exec sed $'s/\xef\xbf\xbd/é/g' {} +
When you are sure that it works, use -i
, this will change files in place:
find ... -exec sed -i $'s/\xef\xbf\xbd/é/g' {} +
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.