简体   繁体   中英

How to prepare a txt file to csv using bash script?

How do I prepare a given txt file in bash for csv import? The given structure is like this:

Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
E-Mail
Street
City
Gender
Employment
Income
*****

As you can see the second record doesn't have an E-Mail address. Any other variation of a not given value is also possible. The values are given line by line and the records are separated by five stars.

I tried to use awk and grep in order to write the records into a separated file for csv import. How do I put the given multiply lines of a record in one line for csv and how do I keep the order if an item eg the telephone number is not given?

Many thanks in advance.

Typically a .csv file has fixed records - and fields that are not included are left empty. So your first example might be:

"Mr","John","Smith","555-1212","jsmith@foo","1 St","New York","M","CSV Wrangler","5"

and your second might be:

"Mrs","Mary","Brown",,"mbrown@foo","5 St","Ottawa","F","CSV Wrangler","5"

There is no way in your input file to detect which field is missing, though. That means you won't be able to reliably create a .csv file. You need to know field names and field values to do that, unless you infer fields based on content ("This contains an @ sign so must be an email address", etc.) Even that will fail if you have a record like:

****
Homer
Springfield
****

Is that first name and surname, or first name and city? You don't have a way to tell.

To make the problem more clearly:
I do get email requests which contain lots of unneeded stuff.
So I export the *.eml files to /tmp directory.
I collect the needed information like this into one file called Input.txt.
My code looks like this:

 #!/bin/bash touch /tmp/Input.txt OUTFILE=/tmp/Input.txt cat /dev/null > "$OUTFILE" FILES=/tmp/*.eml for f in $FILES do grep 'Salutation :' "$f" | sed 's/^.*: //' | perl -ne 'print "S1 $_"' >> "$OUTFILE" grep 'Surname :' "$f" | sed 's/^.*: //' | perl -ne 'print "S2 $_"' >> "$OUTFILE" grep 'Name :' "$f" | sed 's/^.*: //' | perl -ne 'print "S3 $_"' >> "$OUTFILE" grep 'Telephone :' "$f" | sed 's/^.*: //' | perl -ne 'print "S4 $_"' >> "$OUTFILE" grep 'E-Mail :' "$f" | sed 's/^.*: //' | perl -ne 'print "S5 $_"' >> "$OUTFILE" grep 'Street :' "$f" | sed 's/^.*: //' | perl -ne 'print "S6 $_"' >> "$OUTFILE" grep 'City :' "$f" | sed 's/^.*: //' | perl -ne 'print "S7 $_"' >> "$OUTFILE" grep 'Date :' "$f" | sed 's/^.*: //' | perl -ne 'print "S8 $_"' >> "$OUTFILE" grep 'Size :' "$f" | sed 's/^.*: //' | perl -ne 'print "S9 $_"' >> "$OUTFILE" grep 'Animals :' "$f" | sed 's/^.*: //' | perl -ne 'print "S10 $_"' >> "$OUTFILE" grep 'Employment :' "$f" | sed 's/^.*: //' | perl -ne 'print "S11 $_"' >> "$OUTFILE" grep 'Income :' "$f" | sed 's/^.*: //' | perl -ne 'print "S12 $_"' >> "$OUTFILE" echo "*****" >> "$OUTFILE" done 

Finally I get the OUTFILE Input.txt like this:

S1 Mr
S2 John
S3 Smith
S4 1514009855
S5 john.smith@gmail.com
S6 11 Elm Street
S7 Denver
S8 05/21/2016
S9 66
S10 Cat
S11 Officer
S12 20
*****
S1 Mrs
S2 Mary
S3 Wood
S4 65223457
S5 mary.wood@gmail.com
S6 60 Taft Ave.
S7 Boston
S8 04/26/2016
S10 Dog
S11 Secretary
S12 10
*****
S1 Mrs
S2 Lori
S3 White
S4 56325478
S6 730 Vista del Playa
S7 Anaheim
S8 01/22/2016
S10 Fish
S11 Teacher
S12 80
*****


So the first record is complete S1 till S12.
In the second record there is S9 missing and in the third one there is S5 and S9 missing.
The aim is to get these records read out from Input.txt and put them into a csv-file.
The csv should look like this, considering the missing items:
Salutation,Surname,Name,Telephone,E-Mail,Street,City,Date,Size,Animals,Employment,Income
Mr;John;Smith;1514009855;john.smith@gmail.com;11ElmStreet;Denver;05/21/2016;66;Cat;Officer;20
Mrs;Mary;Wood;65223457;mary.wood@gmail.com;60TaftAve.;Boston;04/26/2016;;Dog;Secretary;10
Mrs;Lori;White;56325478;;730VistadelPlaya;Anaheim;01/22/2016;;Fish;Teacher;80

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM