简体   繁体   English

如何仅操作Perl字符串的一部分?

[英]How can I manipulate just part of a Perl string?

I'm trying to write some Perl to convert some HTML-based text over to MediaWiki format and hit the following problem: I want to search and replace within a delimited subsection of some text and wondered if anyone knew of a neat way to do it. 我正在尝试编写一些Perl,将一些基于HTML的文本转换为MediaWiki格式,并遇到以下问题:我想在某些文本的分隔小节中进行搜索和替换,想知道是否有人知道一种整齐的方法。 My input stream is something like: 我的输入流是这样的:

Please mail <a href="mailto:help@myco.com&amp;Subject=Please help&amp;Body=Please can some one help me out here">support.</a> if you want some help.

and I want to change Please help and Please can some one help me out here to Please%20help and Please%20can%20some%20one%20help%20me%20out%20here respectively, without changing any of the other spaces on the line. 并且我想更改“ Please help和“ Please can some one help me out here到“ Please%20help和“ Please%20can%20some%20one%20help%20me%20out%20here ,而不更改该行上的任何其他空格。

Naturally, I also need to be able to cope with more than one such link on a line so splicing isn't such a good option. 当然,我还需要能够处理一条线上的多个这样的链接,因此拼接不是一个好的选择。

I've taken a good look round Perl tutorial sites (it's not my first language) but didn't come across anything like this as an example. 我对Perl教程站点(这不是我的母语)进行了很好的浏览,但是没有遇到像这样的例子。 Can anyone advise an elegant way of doing this? 谁能建议一种优雅的方法?

Your task has two parts. 您的任务分为两个部分。 Find and replace the mailto URIs - use a HTML parsing module for that. 查找并替换mailto URI-为此使用HTML解析模块。 This topic is covered thoroughly on Stack Overflow. 堆栈溢出中全面介绍了该主题。

The other part is to canonicalise the URI. 另一部分是规范化URI。 The module URI is suitable for this purpose. 模块URI适用于此目的。

use URI::mailto;
my @hrefs = ('mailto:help@myco.com&amp;Subject=Please help&amp;Body=Please can some one help me out here');
print URI::mailto->new($_)->as_string for @hrefs;
__END__
mailto:help@myco.com&amp;Subject=Please%20help&amp;Body=Please%20can%20some%20one%20help%20me%20out%20here

Why dont you just search for the "Body=" tag until the quotes and replace every space with %20. 您为什么不只搜索“ Body =”标记,直到引号将所有空格替换为%20。

I would not even use regular expresions for that since I dont find them useful for anything except mass changes where everything on the line is changes. 我什至不使用常规表达式,因为我发现它们对除质量变化(线路上的所有事物都在变化)之外的任何事情都不有用。

A simple loop might be the best solution. 一个简单的循环可能是最好的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM