简体   繁体   中英

Bash: remove semicolons from a line in a CSV-file

I've a CSV-file with a few hundred lines and a lot (not all) of these lines contains data (Klas/Lesgroep:;;T2B1) which I want to extract. ie ;;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;

I want to delete the semicolons which are in front of Klas/Lesgroep but the number of semicolons is variable. How can I delete these semicolons in Bash ?

I'm not a native speaking Englishman so I hope it's clear to you

With sed you can search for lines starting with at least one semi-colon followed by Klas/Lesgroep and, if found, substitute leading ; with nothing:

$ sed '/;;*Klas\/Lesgroep/s/^;*//g' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;

To remove any nonempty run of ; chars. that come directly before literal Klas/Lesgroep :

With GNU or BSD/macOS sed :

$ sed -E 's|;+(Klas/Lesgroep)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
  • The s function performs string substitution (replacement):

    • The 1st argument is a regex (regular expression) that specifies what part of the line to match,
    • and the 2nd arguments specifies what to replace the matching part with.
    • Note how I've chosen | as the regex/argument delimiter instead of the customary / , because that allows unescaped use of / chars. inside the regex.
  • ;+ matches one or more directly adjacent ; chars.

  • (Klas/Lesgroep) matches literal Klas/Lesgroep and by enclosing it in (...) - making it a capture group - the match is remembered and can be referenced as \\1 - the 1st capture group in the regex - in the replacement argument to s .

The net effect is that all ; chars. directly preceding Klas/Lesgroep are removed .


POSIX-compliant form:

$ sed 's|;\{1,\}\(Klas/Lesgroep\)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;

POSIX requires the less powerful and antiquated BRE syntax , where duplication symbol + must be emulated as \\{1,\\} , and, generally, metacharacters ( , ) , { , } must be \\ -escaped.

To remove all ";" from a file , we can use sed command . sed is used for modifying the files.

$ sed 's/find/replace/g' file

The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.

So to remove ";" just find and replace it with nothing.

sed 's/;//g' file.csv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM