在 Python 中使用 Selenium，通过 xpath、.text 使用查找元素进行网页抓取

Question

I'm trying to automate a task that I have to do every week using Python and Selenium.我正在尝试使用 Python 和 Selenium 自动执行每周必须执行的任务。

I go to a website and if there is any new files I download them, rename them using the date they came in and who they went to, and then place them in a folder on the shared network server.我访问一个网站，如果有任何新文件，我会下载它们，使用它们进来的日期和去的人重命名它们，然后将它们放在共享网络服务器上的文件夹中。

The website provides the date in which the file comes in through a clickable link.该网站通过可点击的链接提供文件进入的日期。

Using find elements by xpath, and what I assume are parameters, starts with and contains, I've been able to search through all the links with the date and time.通过 xpath 使用 find 元素，我假设是参数，开始和包含，我已经能够搜索所有带有日期和时间的链接。

receivedTime = browser.find_elements_by_xpath('//*[starts-with(@id, 
"anchor") and contains(@id, "_0")]')
for time in receivedTime:
 print(time.text)

The output looks like this for example, "11/2/2018, 8:00:50 AM".输出看起来像这样，例如，“11/2/2018, 8:00:50 AM”。

I would like to format that text to say "2018-11-02", how would I go about doing that?我想将该文本格式化为“2018-11-02”，我该怎么做？

It's my understanding that the variable time is just an object of the Current Xpath and .text is just a property of that object.我的理解是变量time只是 Current Xpath 的一个对象，而 .text 只是该对象的一个属性。 Is my understanding correct?我的理解正确吗？

Thank you.谢谢你。

ANSWER:回答：

receivedTime = browser.find_elements_by_xpath('//*[starts-with(@id, 
"anchor") and contains(@id, "_0")]')
for time in receivedTime:
 date = str(time.text).split(',')
 dateTime = datetime.strptime(date[0], '%m/%d/%Y').strftime('%Y-%m-%d-')
 print(dateTime)

Answer 1

You should use the package datetime ( import datetime )您应该使用包datetime ( import datetime )
The time variable is a string so you have to convert it into datetime and change the format like this :时间变量是一个字符串，因此您必须将其转换为日期时间并像这样更改格式：

date = str(time.text).split(',')
datetime.datetime.strptime(date[0], '%m/%d/%Y').strftime('%Y-%m-%d')

Answer 2

You can also use a regular expression to extract the numbers and reformat the date:您还可以使用正则表达式来提取数字并重新格式化日期：

import re
text = "11/2/2018, 8:00:50 AM"
date_tuple = re.match("(\d+)\/(\d+)\/(\d+)", text).groups()
file_name = "%d-%02d-%02d" % (int(date_tuple[2]), int(date_tuple[0]), int(date_tuple[1]))

Result: "2018-11-02"结果：“2018-11-02”

在 Python 中使用 Selenium，通过 xpath、.text 使用查找元素进行网页抓取

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-11-07 13:55:53

解决方案2
0 2018-11-07 14:05:00

在 Python 中使用 Selenium，通过 xpath、.text 使用查找元素进行网页抓取

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-11-07 13:55:53

解决方案2 0 2018-11-07 14:05:00

解决方案1
2 已采纳 2018-11-07 13:55:53

解决方案2
0 2018-11-07 14:05:00