简体   繁体   English

从字符串中去除破折号?

[英]Strip dashes from a string?

For web scraping, I need to match the last part of a URL and replace "-" dashes with " " spaces. 对于网络抓取,我需要匹配URL的最后一部分,并用“”空格替换“-”破折号。

Code looks like this... 代码看起来像这样...

<div class="tags">
    <span class="tag" style="background-color: #5A214A;">
        <a href="/Services/Research/Telecoms-software/Service-Assurance/">SA</a>
    </span>
</div>

I want to be left with "Service Assurance" (this part may contain multiple "-" dashes and require multiple replacements). 我想留下“服务保证”(此部分可能包含多个“-”破折号,并且需要多次替换)。

Currently being used: 当前正在使用:

Xpath: Xpath的:

//span[@class="tag"]/a/@href

Regex: 正则表达式:

/.*/(.*)/

This produces "Service-Assurance", but does not strip out the "-". 这将产生“服务保证”,但不会去除“-”。

I am told elsewhere that this replacement is not possible since I am already using Regex to find the string between the final "/" slashes. 在其他地方,我被告知不可能进行此替换,因为我已经在使用Regex查找最后的“ /”斜杠之间的字符串。

Can I do both? 我可以两者都做吗? Can I replace the "-" dashes at the end, too? 我也可以在末尾替换破折号吗?

Regex is plain, inside an app called import.io, no particular language flavour. 正则表达式很简单,在一个名为import.io的应用程序中,没有特殊的语言味道。

Thank-you very much. 非常感谢你。

Try this xpath without the regex: 尝试不带正则表达式的xpath:

//*[@class='tag-wrapper']/input[1]/@value

althernatively you can also try these methods: 另外,您也可以尝试以下方法:

I scrape urls in google-sheets all the time with xpaths and regexes - so if you want to try: 我一直用xpaths和正则表达式在Google表格中抓取网址-因此,如果您想尝试:

=importXML("url goes here","//span[@class="tag"]/a/@href")

now then if you do at least get the url string back, then you know its working ad we can then modify it to this to get what you want: 现在,如果您至少返回了url字符串,那么您就知道其有效的广告,然后我们可以对其进行修改以获取所需的内容:

=SUBSTITUTE(REGEXEXTRACT(importXML("url goes here","//span[@class="tag"]/a/@href"),".*\/(.*)\/$"),"-"," ")

Let me know if you have issues - there are a couple of weird quirks with google - but if you share the url your pulling that xpath in with I can at least test it myself - i use this method now more than any others, I used to use import.io and outwit hub etc a ton 让我知道您是否有问题-与Google有一些怪异的怪癖-但如果您共享该URL,则至少可以自己测试一下该xpath-我现在比其他任何人都使用这种方法大量使用import.io和outwit hub等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM