简体   繁体   English

如何在 url awk/sed/python 中的模式之间提取文本

[英]how to extract text between a pattern in a url awk/sed/python

I want to extract the plugin name and the theme name from the urls below我想从下面的网址中提取插件名称和主题名称

http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2

i tried awk and sed both.我试过 awk 和 sed。 couldn't get desired results.无法得到想要的结果。

sed sed

Use this command:使用这个命令:

 sed  's/.*\(plugin\|theme\)s\/\([^\/]*\)\/.*/\2/'

It looks for the first occurrence of either plugins or themes , followed by a slash ( / ).它查找第一次出现的pluginsthemes ,后跟斜杠 ( / )。 Next it takes a series of non slashes ( [^\\/]* ) followed by a slash.接下来需要一系列非斜杠( [^\\/]* )后跟斜杠。 This sequence is put in a group \\(\\) and reinserted at the substitution \\2 .这个序列被放入一个组\\(\\)并在替换\\2处重新插入。

Example usage:用法示例:

$ cat file 
http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2
new2, 2.2.2.2, myweb2.com
$ sed  's/.*\(plugin\|theme\)s\/\([^\/]*\)\/.*/\2/' file
contact-form-7
recent-tweets-widget
revslider
js_composer
themeforest-9412083-specular-responsive-multipurpose-business-theme

awk awk

Using is actually even easier, just set the field separator to a slash and print the sixth field.使用实际上更简单,只需将字段分隔符设置为斜杠并打印第六个字段。

awk -F '/' '{ print $6 }' file

Which yields the same result as the above command.这产生与上述命令相同的结果。

Very simple python approach非常简单的python方法

with open('urls.txt') as f:
    for url in f:
        print url.split('/')[5]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM