Here is an excerpt of my text file
namq_aux_lp 4 Last update of data 07.07.2014 t
namq_aux_ulc 4 Last update of data 08.07.2014
namq_aux_gph 4 Last update of data 07.07.2014
prc_hicp_cann 4 Last update of data 17.07.2014
namq_nace10_k 4 Last update of data 02.07.2014 clas
sei_bsco_m 4 Last update of data 10.06.2014
ei_bsin_m_r2 4 Last update of data 26.06.2014
lassei_bsbu_m_r2 4 Last update of data 26.06.2014
assei_bsrt_m_r2 4 Last update of data 26.06.2014 t
ei_bssi_m_r2 4 Last update of data 26.06.2014 t
ei_bsse_m_r2 4 Last update of data 26.06.2014
ei_bsci_m_r2 4 Last update of data 26.06.2014
10 sts_trtu_m 4 Last update of data 17.07.2014 c
I'm trying to format it and cleaning it, keeping it the first column and the date. However as you can see, there is the 10 on the last line. I cannot remove it because if I do, the date for sei_bsco_m
will be amputated.
Any help would be appreciated.
Note Code is here https://ideone.com/JbuRHK
Desired output would be :
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
...
assei_bsrt_m_r2 26.06.2014
...
Just look for the first date on each line from the 7th field on and print that plus the 6th-previous field:
$ awk '{
for (i=7;i<=NF;i++)
if ($i ~ /^([[:digit:]]{2}\.){2}[[:digit:]]{4}$/) {
printf "%-20s%10s\n", $(i-6), $i
next
}
}' file
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
The above doesn't care how many leading or trailing undesirable fields you might have, or what those fields might contain, as long as you don't have 7 leading undesirable fields with the 7th one being a date!
Alternatively, this just prints whatever is first on each side of the string "4 Last update of data":
$ awk -F'[[:space:]]+[[:digit:]]+ Last update of data[[:space:]]+' '{
sub(/.*[[:space:]]/,"",$1)
sub(/[[:space:]].*$/,"",$2)
printf "%-20s%10s\n", $1, $2
}' file
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
Here is some that may work:
awk '/^10/ {$1=""}1' file | column -t
namq_aux_lp 4 Last update of data 07.07.2014 t
namq_aux_ulc 4 Last update of data 08.07.2014
namq_aux_gph 4 Last update of data 07.07.2014
prc_hicp_cann 4 Last update of data 17.07.2014
namq_nace10_k 4 Last update of data 02.07.2014 clas
sei_bsco_m 4 Last update of data 10.06.2014
ei_bsin_m_r2 4 Last update of data 26.06.2014
lassei_bsbu_m_r2 4 Last update of data 26.06.2014
assei_bsrt_m_r2 4 Last update of data 26.06.2014 t
ei_bssi_m_r2 4 Last update of data 26.06.2014 t
ei_bsse_m_r2 4 Last update of data 26.06.2014
ei_bsci_m_r2 4 Last update of data 26.06.2014
sts_trtu_m 4 Last update of data 17.07.2014 c
or to get your output:
awk '/^10/ {$1=""}1' file | awk '{print $1,$7}' OFS="\t"
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
Or like this:
awk '/^10/ {$1=""}1' file | awk '{print $1,$7}' | column -t
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
You can use sed
and column
:
sed -nr 's|.*\b(\S+_\S+)\b.*\b([0-9]+[.][0-9]+[.][0-9]+)\b.*|\1\t\2|p' file | column -t
Output:
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
Note:
_
on it. \\S
may not work so you can also consider [^[:space:]]
or [^ \\t\\r]
over it. Yet another solution could be the following:
- removes the first two numbers
- removes spaces
- prints column 1 and 7 with a tab as OFS (Output Field Separator)
$ sed 's/^[0-9][0-9]//' telecharge.txt | sed 's/ //' | awk '{print $1,$7}' OFS='\t'
namq_aux_lp 07.07.2014
namq_aux_ulc 08.07.2014
namq_aux_gph 07.07.2014
prc_hicp_cann 17.07.2014
namq_nace10_k 02.07.2014
sei_bsco_m 10.06.2014
ei_bsin_m_r2 26.06.2014
lassei_bsbu_m_r2 26.06.2014
assei_bsrt_m_r2 26.06.2014
ei_bssi_m_r2 26.06.2014
ei_bsse_m_r2 26.06.2014
ei_bsci_m_r2 26.06.2014
sts_trtu_m 17.07.2014
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.