I made a crawler using python.
But my crawler get date in this format:
s = page_ad.findAll('script')[25].text.replace('\'', '"')
s = re.search(r'\{.+\}', s, re.DOTALL).group() # get json data
s = re.sub(r'//.+\n', '', s) # replace comment
s = re.sub(r'\s+', '', s) # strip whitspace
s = re.sub(r',}', '}', s) # get rid of last , in the dict
dataLayer = json.loads(s)
print dataLayer["page"]["adDetail"]["adDate"]
2017-01-1412:28:07
I want only date without hours ( 2017-01-14
), how get only date if not have white spaces?
use string subset:
>>> date ="2017-01-1412:28:07"
>>> datestr= date[:-8]
>>> datestr
'2017-01-14'
>>>
try this code:
In [2]: from datetime import datetime
In [3]: now = datetime.now()
In [4]: now.strftime('%Y-%m-%d')
Out[4]: '2017-01-24'
I suggest you parse the date first into datetime
object and then show the relevant information out of it.
for this a better approach would be using a library for this. I use dateparser for this tasks, example usage:
import dateparser
date = dateparser.parse('12/12/12')
date.strftime('%Y-%m-%d')
As this is not a standard date format, just slice the end.
st = "2017-01-1412:28:07"
res = st[:10]
print res
>>>2017-01-14
Use datetime
as follows to first convert it into a datetime object, and then format the output as required using the stftime()
function:
from datetime import datetime
ad_date = dataLayer["page"]["adDetail"]["adDate"]
print datetime.strptime(ad_date, "%Y-%m-%d%H:%M:%S").strftime("%Y-%m-%d")
This will print:
2017-01-14
By using this method, it would give you the flexibility to display other items, for example adding %A
to the end would give you the day of the week:
print datetime.strptime(ad_date, "%Y-%m-%d%H:%M:%S").strftime("%Y-%m-%d %A")
eg
2017-01-14 Saturday
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.