简体   繁体   中英

perl how to change url in text file

how in the specified file change the end of the url address ".pl" to ".en" and the penultimate ".com" to ".org"

for example: http://www.addres.pl change to: http://www.addres.en

and if in addres exist like this http://www.addres.com.pl change to: http://www.addres.org.en

and if its appear like this http://www.addres.com.ru then only change .com http://www.addres.org.ru

example of text file input:

http://www.addres.org.en
http://www.addres.com.pl
http://www.addre.pl
http://www.addres.en
http://www.addres.ru
http://com ddd http://www.com.pl.com.pl.com.pl.com.pl
aaa http://www.addres.com.pl! bbb
ccc (http://www.addre.pl) ddd

example of console output:

http://www.addres.org.en
http://www.addres.org.en
http://www.addre.en
http://www.addres.en
http://www.addres.ru
http://com ddd http://www.com.pl.com.pl.com.pl.org.en
aaa http://www.addres.org.en! bbb
ccc (http://www.addre.en) ddd

for now i have this to check if input is a file

#!/usr/bin/perl
use warnings;
use strict;
use File::Find;

if (($#ARGV+1 != 1 )||(! -f $ARGV[0]))
{
  print "podaj plik\n";
  exit 1;
}

#!/usr/local/bin/perl
open (MYFILE, $ARGV[0]);
while (<MYFILE>) {
chomp;
my $url = $_;
for ($url) {
#s|(com)(.??)|org$2| and last;
s|com.pl|org.en| and last;
s|com[.]|org.| and last;
s|[.]pl|.en|; 
}
print "$url\n";
 }
close (MYFILE); 
exit 0;

how to make this

s|com[.]ru|org.ru| and last;

change all addres like this

s|com[.]??|org.??| and last;

where ?? can be for example ru, or en or all others then pl

Quick and dirty:

use strict;
while (<>) {
    s|com[.]pl\b|org.en| or
        s|[.]pl\b|.en|   or
        s|com[.]ru\b|org.ru|;
    print;
}

Pay attention to the regex order and call it from the command line: perl script.pl in.txt .

Then learn the proper three-argument way to open files using lexical variables for filehandles (to prevent global file handles with names as common as MYFILE to clobber one another + get file to close when the lexical variable goes out of scope).

Added:

Looking at your new sample lines, I think you probably need something more like this (I included the regex you asked for at the end of your last edit):

while (<>) {
    s|com[.]pl([\s!)])|org.en\1| 
      or s|[.]pl([\s!)])|.en\1| 
      or s|com[.]([!pl])([\s!)])|org.\1\2| ;
    print;
}

For further advice, read my comments below.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM