简体   繁体   English

如何使用bash脚本将txt文件准备为csv?

[英]How to prepare a txt file to csv using bash script?

How do I prepare a given txt file in bash for csv import? 如何在bash中准备给定的txt文件以进行csv导入? The given structure is like this: 给定的结构是这样的:

Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
Telephone
E-Mail
Street
City
Gender
Employment
Income
*****
Salutation
Name
Surname
E-Mail
Street
City
Gender
Employment
Income
*****

As you can see the second record doesn't have an E-Mail address. 如您所见,第二条记录没有电子邮件地址。 Any other variation of a not given value is also possible. 没有给定值的任何其他变化也是可能的。 The values are given line by line and the records are separated by five stars. 逐行给出值,记录由五颗星分隔。

I tried to use awk and grep in order to write the records into a separated file for csv import. 我尝试使用awk和grep以便将记录写入到单独的文件中以进行csv导入。 How do I put the given multiply lines of a record in one line for csv and how do I keep the order if an item eg the telephone number is not given? 如何将给定的记录乘法行放在csv的一行中,如果没有提供电话号码等项目,如何保持顺序?

Many thanks in advance. 提前谢谢了。

Typically a .csv file has fixed records - and fields that are not included are left empty. 通常,.csv文件具有固定的记录-并且未包括的字段保留为空。 So your first example might be: 因此,您的第一个示例可能是:

"Mr","John","Smith","555-1212","jsmith@foo","1 St","New York","M","CSV Wrangler","5"

and your second might be: 您的第二个可能是:

"Mrs","Mary","Brown",,"mbrown@foo","5 St","Ottawa","F","CSV Wrangler","5"

There is no way in your input file to detect which field is missing, though. 但是,您的输入文件中无法检测到缺少哪个字段。 That means you won't be able to reliably create a .csv file. 这意味着您将无法可靠地创建.csv文件。 You need to know field names and field values to do that, unless you infer fields based on content ("This contains an @ sign so must be an email address", etc.) Even that will fail if you have a record like: 您必须知道字段名称和字段值才能执行此操作,除非您根据内容推断字段(“其中包含@符号,因此必须是电子邮件地址”,等等。)即使您有类似以下记录,也会失败:

****
Homer
Springfield
****

Is that first name and surname, or first name and city? 是名字和姓氏,还是名字和城市? You don't have a way to tell. 您无话可说。

To make the problem more clearly: 为了使问题更清楚:
I do get email requests which contain lots of unneeded stuff. 我确实收到包含大量不需要的东西的电子邮件请求。
So I export the *.eml files to /tmp directory. 因此,我将* .eml文件导出到/ tmp目录。
I collect the needed information like this into one file called Input.txt. 我将所需的信息收集到一个名为Input.txt的文件中。
My code looks like this: 我的代码如下所示:

 #!/bin/bash touch /tmp/Input.txt OUTFILE=/tmp/Input.txt cat /dev/null > "$OUTFILE" FILES=/tmp/*.eml for f in $FILES do grep 'Salutation :' "$f" | sed 's/^.*: //' | perl -ne 'print "S1 $_"' >> "$OUTFILE" grep 'Surname :' "$f" | sed 's/^.*: //' | perl -ne 'print "S2 $_"' >> "$OUTFILE" grep 'Name :' "$f" | sed 's/^.*: //' | perl -ne 'print "S3 $_"' >> "$OUTFILE" grep 'Telephone :' "$f" | sed 's/^.*: //' | perl -ne 'print "S4 $_"' >> "$OUTFILE" grep 'E-Mail :' "$f" | sed 's/^.*: //' | perl -ne 'print "S5 $_"' >> "$OUTFILE" grep 'Street :' "$f" | sed 's/^.*: //' | perl -ne 'print "S6 $_"' >> "$OUTFILE" grep 'City :' "$f" | sed 's/^.*: //' | perl -ne 'print "S7 $_"' >> "$OUTFILE" grep 'Date :' "$f" | sed 's/^.*: //' | perl -ne 'print "S8 $_"' >> "$OUTFILE" grep 'Size :' "$f" | sed 's/^.*: //' | perl -ne 'print "S9 $_"' >> "$OUTFILE" grep 'Animals :' "$f" | sed 's/^.*: //' | perl -ne 'print "S10 $_"' >> "$OUTFILE" grep 'Employment :' "$f" | sed 's/^.*: //' | perl -ne 'print "S11 $_"' >> "$OUTFILE" grep 'Income :' "$f" | sed 's/^.*: //' | perl -ne 'print "S12 $_"' >> "$OUTFILE" echo "*****" >> "$OUTFILE" done 

Finally I get the OUTFILE Input.txt like this: 最后,我得到这样的OUTFILE Input.txt:

S1 Mr S1先生
S2 John S2约翰
S3 Smith S3史密斯
S4 1514009855 S4 1514009855
S5 john.smith@gmail.com S5 john.smith@gmail.com
S6 11 Elm Street S6榆树街
S7 Denver S7丹佛
S8 05/21/2016 S8 2016年5月21日
S9 66 S9 66
S10 Cat S10猫
S11 Officer S11军官
S12 20 S12 20
***** *****
S1 Mrs S1夫人
S2 Mary S2玛丽
S3 Wood S3木
S4 65223457 S4 65223457
S5 mary.wood@gmail.com S5 mary.wood@gmail.com
S6 60 Taft Ave. S6塔夫脱大街60号
S7 Boston S7波士顿
S8 04/26/2016 S8 2016年4月26日
S10 Dog S10狗
S11 Secretary S11秘书
S12 10 S12 10
***** *****
S1 Mrs S1夫人
S2 Lori S2洛里
S3 White S3白
S4 56325478 S4 56325478
S6 730 Vista del Playa S6 730维斯塔德尔普拉亚
S7 Anaheim S7阿纳海姆
S8 01/22/2016 S8 2016年1月22日
S10 Fish S10鱼
S11 Teacher S11老师
S12 80 S12 80
***** *****


So the first record is complete S1 till S12. 因此,第一条记录是完整的S1到S12。
In the second record there is S9 missing and in the third one there is S5 and S9 missing. 在第二个记录中缺少S9,而在第三个记录中缺少S5和S9。
The aim is to get these records read out from Input.txt and put them into a csv-file. 目的是要从Input.txt中读取这些记录,并将它们放入一个csv文件中。
The csv should look like this, considering the missing items: 考虑到缺少的项目,csv应该如下所示:
Salutation,Surname,Name,Telephone,E-Mail,Street,City,Date,Size,Animals,Employment,Income 称呼,姓氏,名称,电话,电子邮件,街道,城市,日期,大小,动物,就业,收入
Mr;John;Smith;1514009855;john.smith@gmail.com;11ElmStreet;Denver;05/21/2016;66;Cat;Officer;20 Mr; John; Smith; 1514009855; john.smith@gmail.com; 11ElmStreet; Denver; 05/21/2016; 66; Cat; Officer; 20
Mrs;Mary;Wood;65223457;mary.wood@gmail.com;60TaftAve.;Boston;04/26/2016;;Dog;Secretary;10 夫人;玛丽;伍德; 65223457; mary.wood@gmail.com; 60塔夫特大道;波士顿; 04/26/2016 ;;狗;秘书; 10
Mrs;Lori;White;56325478;;730VistadelPlaya;Anaheim;01/22/2016;;Fish;Teacher;80 夫人;洛瑞;怀特; 56325478 ;; 730VistadelPlaya;阿纳海姆; 2016年1月22日;;鱼;老师; 80

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM