连接由re.findall在Python中找到的字符串

Question

Scraping data from a website with a search bar. 使用搜索栏从网站上收集数据。

I'm using the search with python and then filtering the results for "Words Like These" : 我正在使用python搜索，然后过滤"Words Like These"的结果：

tabOne = re.findall(r"[A-Z][a-z]*", str(initialFilter))

The problem is that the data that I'm trying to get is occasionally multiple words such as 'Item Number One' but the re.findall shows that as 'Item' 'Number' 'One' . 问题是我试图获取的数据偶尔是多个单词，例如'Item Number One'，但是re.findall将其显示为'Item' 'Number' 'One' Item''Number''One 'Item' 'Number' 'One' 。

I want to retain the original form of the data as one phrase of words, but I'm not sure how to tell python to group them together. 我想将数据的原始形式保留为一个词的短语，但是我不确定如何告诉python将它们分组在一起。

The phrases of the [AZ][az] words are always isolated from each other on the page, so I was wondering if it might be possible to check if the characters next to those words are [AZ][az] as well and if true, group them together. [AZ] [az]词的短语在页面上始终彼此隔离，因此我想知道是否有可能检查这些词旁边的字符是否也是[AZ][az]以及是的，将它们组合在一起。

Any suggestions? 有什么建议么？

Answer 1

Two different ways: 两种不同的方式：

Change your regex to search for multiple words 更改您的正则表达式以搜索多个单词
Join regex results back into string 将正则表达式结果联接回字符串

For (1), you can try something like: 对于（1），您可以尝试类似的方法：

tabOne = re.findall(r"((?:[A-Z][a-z]*\s?)+)", str(initialFilter))

For (2), you can do something like: 对于（2），您可以执行以下操作：

tabOne = re.findall(r"[A-Z][a-z]*", str(initialFilter))
results = ' '.join(tabOne)

连接由re.findall在Python中找到的字符串

问题描述

1 个解决方案

解决方案1
2 2015-08-31 23:34:46

连接由re.findall在Python中找到的字符串

问题描述

1 个解决方案

解决方案1 2 2015-08-31 23:34:46

解决方案1
2 2015-08-31 23:34:46