[英]How to loop multiple elements in python selenium (different CSS selectors)
I am trying to loop inside a class offer-list-wrapper
which has multiple elements inside, almost all the elements are common in the web page for search A and search B (I am scraping a crawler).我试图在一个包含多个元素的类
offer-list-wrapper
循环,几乎所有元素在搜索 A 和搜索 B 的网页中都是常见的(我正在抓取一个爬虫)。
As you can see in both images, offer-list-wrapper
is a common element.正如您在两张图片中看到的,
offer-list-wrapper
是一个常见元素。
I want to extract the data that is inside every organic-offer-wrapper organic-gallery-offer-inner
and organic-list-offer-inner m-gallery-product-item-v2
classes.我想提取每个
organic-offer-wrapper organic-gallery-offer-inner
和organic-list-offer-inner m-gallery-product-item-v2
类中的数据。 Which is very easy to do if you loop inside them with a CSS selector like this:如果你使用这样的 CSS 选择器在它们内部循环,这很容易做到:
for element in driver.find_elements_by_css_selector('.organic-list-offer-inner.m-gallery-product-item-v2'):
In that way you can get every element inside them.通过这种方式,您可以获取其中的每个元素。
BUT the issue starts here: I need to loop inside both cases with ONE generic code that loop inside both classes, and in case a new class appears it has to loop inside it.但问题从这里开始:我需要使用一个在两个类中循环的通用代码在两种情况下循环,如果出现新类,它必须在其中循环。
Let me show you my code:让我向您展示我的代码:
for element in driver.find_elements_by_class_name('offer-list-wrapper'):
try:
item_name = element.find_element_by_class_name('organic-gallery-title__content').text
except:
item_name = np.nan
try:
price = element.find_element_by_class_name('gallery-offer-price').get_attribute('title').replace('$', '').replace(',', '')
min_order = element.find_element_by_class_name('gallery-offer-minorder').find_element_by_tag_name('span').text.replace(' Pieces', '').replace(' Piece', '').replace(' Units', '').replace(' Unit', '').replace(' Sets', '').replace(' Set', '').replace(' Pairs', '').replace(' Pair', '').replace('Boxes', '').replace('Box', '').replace('Bags', '').replace('Bag', '')
# separate min and max price
except:
price = np.nan
min_order = np.nan
This first one returns only the first element:第一个只返回第一个元素:
for element in driver.find_elements_by_css_selector('.organic-offer-wrapper.organic-gallery-offer-inner'):
try:
item_name = element.find_element_by_class_name('organic-gallery-title__content').text
except:
item_name = np.nan
try:
price = element.find_element_by_class_name('gallery-offer-price').get_attribute('title').replace('$', '').replace(',', '')
min_order = element.find_element_by_class_name('gallery-offer-minorder').find_element_by_tag_name('span').text.replace(' Pieces', '').replace(' Piece', '').replace(' Units', '').replace(' Unit', '').replace(' Sets', '').replace(' Set', '').replace(' Pairs', '').replace(' Pair', '').replace('Boxes', '').replace('Box', '').replace('Bags', '').replace('Bag', '')
# separate min and max price
except:
price = np.nan
min_order = np.nan
This second one only loops inside .organic-offer-wrapper.organic-gallery-offer-inner
(returning all elements that I need), but it doesn't loop inside .organic-list-offer-inner.m-gallery-product-item-v2
第二个只在
.organic-offer-wrapper.organic-gallery-offer-inner
循环(返回我需要的所有元素),但它不会在.organic-list-offer-inner.m-gallery-product-item-v2
内循环.organic-list-offer-inner.m-gallery-product-item-v2
You can get all the products by searching for the div tags that contain the attribute data-content="productItem" .您可以通过搜索包含属性data-content="productItem"的 div 标签来获取所有产品。 That is assuming each item has that attribute.
那是假设每个项目都具有该属性。 From the screenshots you posted, it seems like that is the case.
从您发布的屏幕截图来看,情况似乎是这样。
You can accomplish this using find_elements_by_xpath()您可以使用 find_elements_by_xpath() 完成此操作
for item in driver.find_elements_by_xpath('//div[@data-content="productItem"]'):
....
This would probably be the best way without having to worry about the elements having different css classes.这可能是最好的方法,而不必担心具有不同 css 类的元素。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.