简体   繁体   中英

Linux script to parse each line, check the regex and modify the line

I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):

something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,

and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)

something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,

I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2

for f in /filepath/filename.csv
do
            while read p; do
            if [[$p == .\..]] ; then echo $p; fi
            done <$f
done

Thanks in advance!

目前,我无法为您提供有效的代码,而是提供一些快速建议:1.尝试使用名为sed的工具。2.了解正则表达式的“捕获组”以获取有关如何根据表达式划分文本的信息。

To separate strings AWK will be useful

    echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'

Hope it will help.

As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.

Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:

sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv

In the above example, s/// is the replacement command. From the manpage:

s/regexp/replacement/

Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. [...]

Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!

Ok, i managed to use regexp, but the following command seems not working again:

sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'

sed: -e expression #1, char 125: unknown command: `\\'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM