[英]how to extract text between a pattern in a url awk/sed/python
I want to extract the plugin name and the theme name from the urls below我想从下面的网址中提取插件名称和主题名称
http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2
i tried awk and sed both.我试过 awk 和 sed。 couldn't get desired results.
无法得到想要的结果。
Use this sed command:使用这个sed命令:
sed 's/.*\(plugin\|theme\)s\/\([^\/]*\)\/.*/\2/'
It looks for the first occurrence of either plugins
or themes
, followed by a slash ( /
).它查找第一次出现的
plugins
或themes
,后跟斜杠 ( /
)。 Next it takes a series of non slashes ( [^\\/]*
) followed by a slash.接下来需要一系列非斜杠(
[^\\/]*
)后跟斜杠。 This sequence is put in a group \\(\\)
and reinserted at the substitution \\2
.这个序列被放入一个组
\\(\\)
并在替换\\2
处重新插入。
Example usage:用法示例:
$ cat file
http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2
new2, 2.2.2.2, myweb2.com
$ sed 's/.*\(plugin\|theme\)s\/\([^\/]*\)\/.*/\2/' file
contact-form-7
recent-tweets-widget
revslider
js_composer
themeforest-9412083-specular-responsive-multipurpose-business-theme
Using awk is actually even easier, just set the field separator to a slash and print the sixth field.使用awk实际上更简单,只需将字段分隔符设置为斜杠并打印第六个字段。
awk -F '/' '{ print $6 }' file
Which yields the same result as the above sed command.这产生与上述sed命令相同的结果。
Very simple python approach非常简单的python方法
with open('urls.txt') as f:
for url in f:
print url.split('/')[5]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.