[英]Extracting dates from span using Python Selenium
I have this page:我有这个页面:
For every review published there is a corresponding date in the title attribute,对于每篇发表的评论,标题属性中都有一个相应的日期,
check this :检查这个:
<span class="ratingDate relativeDate" title="4 February 2017">Reviewed yesterday </span>
So, for every review published there is a date in the title attribute, my problem is that I am not able to fetch all the dates from the reviews.因此,对于发布的每条评论,标题属性中都有一个日期,我的问题是我无法从评论中获取所有日期。
I tried with this code:我尝试使用此代码:
def Dates():
datediv = driver.find_elements_by_css_selector('div > div.col2of2 > div > div.wrap > div.rating.reviewItemInline > span.ratingDate.relativeDate')
dateatt = datediv.get_Attribute("title")
for date in dateatt:
print(date.text)
But still it does not work and I get the error of但它仍然不起作用,我得到了错误
AttributeError: 'list' object has no attribute 'get_Attribute'
Where am I going wrong?我哪里错了?
Edit Ok so now I have scraped the Usernames, Date, Title and the entire review from every page, however, in just IDLE only.编辑好的,现在我已经从每个页面中抓取了用户名、日期、标题和整个评论,但是,仅在空闲状态下。 I want to put the scraped data from every page to say a dictionary and export it into json or maybe directly put it into an excel sheet.
我想把从每一页刮下来的数据说成一个字典,然后把它导出到 json 中,或者直接把它放到一个 excel 表中。
The approach with the dictionary is quite confusing as I am literally not understand how would I update different keys independently with values.使用字典的方法非常令人困惑,因为我真的不明白如何用值独立地更新不同的键。
content = {}
def mainfunction():
#Hotel Name
hname = driver.find_element_by_id('HEADING').text
#User Names
usernames = driver.find_elements_by_class_name('scrname')
for
#Dates
datediv = driver.find_elements_by_css_selector('div > div.col2of2 > div > div.wrap > div.rating.reviewItemInline > span.ratingDate.relativeDate')
#Review Title
titlesdiv = driver.find_elements_by_class_name('isNew')
#for titles in titlesdiv:
#print(titles.find_element_by_class_name('noQuotes').text)
#Reviews
linkdiv = driver.find_element_by_class_name('expandLink')
linkspan = linkdiv.find_element_by_class_name('ulBlueLinks')
linkspan.click()
try:
WebDriverWait(driver,10).until(ec.presence_of_element_located((By.CLASS_NAME,"no_padding")))
close1 = driver.find_element_by_css_selector('body > div> span > div.ui_close_x')
close1.click()
except TimeoutException:
print ("Loading took too much time!")
reviews = driver.find_elements_by_css_selector(' div > div.col2of2 > div > div.wrap > div > div > p')
for review in reviews:
print(review.text)
#push the contents to the dictionary
#Move to next page
nextpage()
#To follow successive pages and scrape the content
def nextpage():
nextpage = driver.find_element_by_css_selector('#REVIEWS > div.deckTools.btm.test > div > a.nav.next.rndBtn.ui_button.primary.taLnk').click()
try:
WebDriverWait(driver,10).until(ec.presence_of_element_located((By.CLASS_NAME,"pcb")))
close2 = driver.find_element_by_class_name('ui_close_x')
close2.click()
except TimeoutException:
print ("Loading took too much time!")
mainfunction()
datediv
is list. datediv
是列表。 You need to iterate over it你需要迭代它
datediv = driver.find_elements_by_css_selector('div > div.col2of2 > div > div.wrap > div.rating.reviewItemInline > span.ratingDate.relativeDate')
for dateatt in datediv:
print(dateatt.get_attribute("title"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.