简体   繁体   English

需要帮助来创建正则表达式或脚本以在html文件上运行

[英]Need help creating a regex or script to run on html file

So I have this index that i am working on but i really find it a hassle to go in by hand and cross-link everything. 因此,我有了正在处理的索引,但我真的发现手动输入和交叉链接所有内容很麻烦。 I know a little bit about regexps and a little perl. 我对正则表达式和perl有所了解。 here is what the html looks like 这是HTML的样子

cf. <i>Penitencia y Reconciliaci&oacute;n</i>

but sometimes there is an instance of this 但是有时候有一个例子

cf. <i>Advenimiento, Consumaci&oacute;n, Expectaci&oacute;n</i>;

I ran this regex on it: 我在上面运行了这个正则表达式:

cf\. <i>([^,]+,)</i>

but my goal is to be able to run a regex that will wrap around one or multiple words and then copy the inner Html of the "phrase" and paste it inside a anchor tag something like this 但我的目标是能够运行一个可将一个或多个单词环绕的正则表达式,然后复制“词组”的内部HTML并将其粘贴到类似这样的锚标记中

cf. <i><a href="#Penitencia y Reconciliaci&oacute;n">Penitencia y Reconciliaci&oacute;n</a></i>

which i was able to accomplish with the regex above; 我可以使用上述正则表达式来完成; but the problem is that my regex is does not take into consideration that there might be two "phrases" that it needs to wrap itself around. 但是问题是我的正则表达式没有考虑到可能需要包装两个“短语”。 So my whole goal is to end up with this: 所以我的总体目标是最终做到这一点:

cf. <i><a href="#Advenimiento">Advenimiento</a>, <a href="#Consumaci&oacute;n">Consumaci&oacute;n</a>, <a href="#Expectaci&oacute;n">Expectaci&oacute;n</a></i>;

any help would be really appreciated 任何帮助将非常感激

In the context of creating a program to automate this, the better, harder, faster, stronger solution would - I agree with the comment to the OP - be to use the DOM to look up/parse/query tags, get the values, then modify and rewrite them. 在创建一个使之自动化的程序的上下文中,更好,更困难,更快,更强大的解决方案将是-我同意OP的意见-是使用DOM查找/解析/查询标签,获取值,然后修改并重写它们。 I'm assuming from your specific example that this is a one-off find-and-replace, or something you don't mind -running a replace manually every once in a while... 我从您的特定示例假设这是一次性的查找和替换,或者您不介意的事情-偶尔偶尔手动运行一次替换...

A Perl s//-expression (I guess p!!-expression in this case), which was only tested in an emulator: Perl s //-表达式(在这种情况下,我猜是p !!-expression),仅在仿真器中进行了测试:

s!(?<=,)(\s?)([^<,]+)(?=,|</i>)|(?<=<i>)([^<,]+)(?=,|</i>)!$1<a href="#$2$3">$2$3</a>!i

Bear in mind that, as written, this will only match items enclosed within <i> tags and of course is not tolerant of other tags in between them - just a few of the reasons you should not put this into program code... 请记住,按照书面规定,这只会匹配<i>标记中包含的项目,并且当然不能容忍它们之间的其他标记-只是一些您不应该将其放入程序代码中的原因...

The expression turns this HTML: 该表达式将显示以下HTML:

Parte del texto inicial. <i>Penitencia y Reconciliaci&oacute;n</i> 
<i>Advenimiento, Consumaci&oacute;n, Expectaci&oacute;n</i>; Otro texto que <em>no es especial</em> ... <i>Otra etiqueta que debe estar vinculada</i>
Otra l&iacute;nea <i>con un enlace</i> y un texto m&aacute;s.

into this text: 变成这段文字:

Parte del texto inicial. <i><a href="#Penitencia y Reconciliaci&oacute;n">Penitencia y Reconciliaci&oacute;n</a></i> 
<i><a href="#Advenimiento">Advenimiento</a>, <a href="#Consumaci&oacute;n">Consumaci&oacute;n</a>, <a href="#Expectaci&oacute;n">Expectaci&oacute;n</a></i>; Otro texto que <em>no es especial</em> ... <i><a href="#Otra etiqueta que debe estar vinculada">Otra etiqueta que debe estar vinculada</a></i>
Otra l&iacute;nea <i><a href="#con un enlace">con un enlace</a></i> y un texto m&aacute;s.

As a side note, Your question is rather hard to read, and probably should have been tagged [perl] as well; 附带说明一下,您的问题很难阅读,可能还应该标记为[perl] this probably contributed significantly to it not being answered for a while... but better late than never! 这可能是造成它被暂时拒绝的重要原因……但是迟到总比没有好!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM