[英]How to remove a specific string common in multiple lines in a CSV file using shell script?
I have a csv file which contains 65000 lines (Size approximately 28 MB). 我有一个csv文件,其中包含65000行(大小约为28 MB)。 In each of the lines a certain path in the beginning is given eg "c:\\abc\\bcd\\def\\123\\456". 在每一行中,都以开头指定路径,例如“ c:\\ abc \\ bcd \\ def \\ 123 \\ 456”。 Now let's say the path "c:\\abc\\bcd\\" is common in all the lines and rest of the content is different. 现在,假设路径“ c:\\ abc \\ bcd \\”在所有行中都是通用的,其余内容则有所不同。 I have to remove the common part (In this case "c:\\abc\\bcd\\") from all the lines using a shell script. 我必须使用shell脚本从所有行中删除公共部分(在本例中为“ c:\\ abc \\ bcd \\”)。 For example the content of the CSV file is as mentioned. 例如,CSV文件的内容如前所述。
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.frag 0 0 0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.vert 0 0 0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.frag 16 24 3
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.vert 87 116 69
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0.vert.bin 75 95 61
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-0 0 0
C:/Abc/Def/Test/temp\.\test\GLNext\FILE0.link-link-6 0 0 0
In the above example I need the output as below 在上面的示例中,我需要以下输出
FILE0.frag 0 0 0
FILE0.vert 0 0 0
FILE0.link-link-0.frag 17 25 2
FILE0.link-link-0.vert 85 111 68
FILE0.link-link-0.vert.bin 77 97 60
FILE0.link-link-0 0 0
FILE0.link 0 0 0
Can any of you please help me out with this? 谁能帮我这个忙吗?
You could use sed
: 您可以使用sed
:
$ cat test.csv
"c:\abc\bcd\def\123\456", 1, 2
"c:\abc\bcd\def\234\456", 1, 2
"c:\abc\bcd\def\432\456", 3, 4
$ sed -i.bak -e 's/c\:\\abc\\bcd\\//1' test.csv
$ cat test.csv
"def\123\456", 1, 2
"def\234\456", 1, 2
"def\432\456", 3, 4
I am using sed
here in this way: 我在这里以这种方式使用sed
:
sed -e 's/<SEARCH TERM>/<REPLACE_TERM>/<OCCURANCE>' FILE
where 哪里
<SEARCH TERM>
is what we are looking for (in this case c:\\abc\\bcd\\
, but backslashes need to be escaped). <SEARCH TERM>
是我们要查找的内容(在本例中为c:\\abc\\bcd\\
,但是反斜杠需要转义)。 <REPLACE TERM>
is what we want to replace it with, in this case nothing, and <REPLACE TERM>
是我们要替换的内容,在这种情况下,什么也没有,并且 <OCCURANCE>
is which occurance of the item we want to replace, in this case the first item in each line. <OCCURANCE>
是我们要替换的项目的哪种情况,在这种情况下,这是每行中的第一个项目。 ( -i.bak
stands for: Don't output, just edit this file. (but make a backup first)) ( -i.bak
代表:不输出,仅编辑此文件。(但请先进行备份))
Updated according to @david-c-rankin comment. 根据@ david-c-rankin注释进行了更新。 He is right, make a backup before editing files in case you make a mistake. 他说的没错,请在编辑文件之前进行备份,以防万一您输入错误。
# init variable
MaxPath="$( sed -n 's/,.*//p;1q' YourFile )"
GrepPath="^$( printf "%s" "${MaxPath}" | sed 's#\\#\\\\#g' )"
# search the biggest pattern to remove
while [ ${#MaxPath} -gt 0 ] && [ $( grep -c -v -E "${GrepPath}" YourFile ) -gt 0 ]
do
MaxPath="${MaxPath%%?}"
GrepPath="^$( printf "%s" "${MaxPath}" | sed 's#\\#\\\\#g' )"
done
# Adapt your file
if [ ${#MaxPath} -gt 0 ]
then
sed "s#${GrepPath}##" YourFile
fi
grep -c -v -E
is not optimized in term of performance (treat whle file each time where it can stop at first miss) grep -c -v -E
在性能方面未进行优化(每次可能在第一次丢失时停止的地方都处理文件)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.