简体   繁体   中英

how to extract text between a pattern in a url awk/sed/python

I want to extract the plugin name and the theme name from the urls below

http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2

i tried awk and sed both. couldn't get desired results.

sed

Use this command:

 sed  's/.*\(plugin\|theme\)s\/\([^\/]*\)\/.*/\2/'

It looks for the first occurrence of either plugins or themes , followed by a slash ( / ). Next it takes a series of non slashes ( [^\\/]* ) followed by a slash. This sequence is put in a group \\(\\) and reinserted at the substitution \\2 .

Example usage:

$ cat file 
http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2
new2, 2.2.2.2, myweb2.com
$ sed  's/.*\(plugin\|theme\)s\/\([^\/]*\)\/.*/\2/' file
contact-form-7
recent-tweets-widget
revslider
js_composer
themeforest-9412083-specular-responsive-multipurpose-business-theme

awk

Using is actually even easier, just set the field separator to a slash and print the sixth field.

awk -F '/' '{ print $6 }' file

Which yields the same result as the above command.

Very simple python approach

with open('urls.txt') as f:
    for url in f:
        print url.split('/')[5]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM