简体   繁体   English

需要帮助理解python片段与正则表达式和cURL

[英]Need help understanding python snippet with regex and cURL

EDIT - Just added entire cURL function for reference/more information but need help with if statements - regex 编辑 - 刚刚添加了整个cURL函数以供参考/更多信息,但需要if语句的帮助 - 正则表达式

Looking for help to understand the if statements in this cURL. 寻求帮助以理解此cURL中的if语句。 I've read through some python documentation and I understand each of the pieces, that this is searching with regex and replacing. 我已经阅读了一些python文档,我理解了每个部分,这是用正则表达式进行搜索和替换。 Just hoping someone might be able to help give a bigger picture explanation. 只是希望有人能够帮助提供更大的解释。 I don't really understand the .groups. 我真的不明白.groups。

To give a little more background this script is accessing another site via cURL it stores a cookie and when ran checks if cookie is valid, if not it grabs a new one after posting username/password. 为了给出更多背景,这个脚本通过cURL访问另一个站点,它存储一个cookie,当运行时检查cookie是否有效,如果没有,它会在发布用户名/密码后抓取一个新的。 The site recently changed and I'm trying to figure out what I need to change to get this working again. 该网站最近发生了变化,我正在试图弄清楚我需要改变什么才能让它再次运作。

#get auth cookie for sso
def getAuthCookie( self ):
    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(c.SSL_VERIFYPEER, False)
    c.setopt(c.FOLLOWLOCATION, True)
    c.setopt(c.TIMEOUT, 60)
    c.setopt(c.USERPWD, self.user+":"+cred.getpasswd( self.encPasswd ) )
    c.setopt(c.URL, 'https://sso.sample.com')
    c.setopt(c.COOKIEJAR, self.cookieDir)
    c.setopt(c.COOKIEFILE, self.cookieDir )
    c.setopt(c.WRITEFUNCTION, buffer.write)
    c.perform()
    c.unsetopt(c.USERPWD)
    c.setopt(c.URL, 'https://sample.com')
    c.perform()
    html = str(buffer.getvalue())    

----------------------------------------------------------
if "RelayState" in html:
    rex = re.compile( "input type=\"hidden\" name=\"RelayState\" value=\"(.*)\"" )
    RELAY = rex.search( html ).groups()[0]
if "SAMLResponse" in html:
    rex = re.compile( "input type=\"hidden\" name=\"SAMLResponse\" value=\"(.*)\"" )
    SAML =  rex.search( html ).groups()[0]
    datastuff = {'SAMLResponse':SAML,'RelayState':RELAY,'redirect':'Redirect','show_button':'true'}
if "form method=\"POST\" action=" in html:
    rex = re.compile( "form method=\"POST\" action=\"(.*)\" " )
    postUrl = rex.search( html ).groups()[0]
---------------------------------------------------------- 

#post our saml obtained, get to our final dest
    c.setopt(c.URL, postUrl )
    c.setopt(c.POST, True)
    c.setopt(c.POSTFIELDS, urlencode( datastuff ))
    c.perform()
    c.close()

See the comments I have injected in the code: 请参阅我在代码中注入的注释:

#get auth cookie for sso
def getAuthCookie( self ):
    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(c.SSL_VERIFYPEER, False)
    c.setopt(c.FOLLOWLOCATION, True)
    c.setopt(c.TIMEOUT, 60)
    c.setopt(c.USERPWD, self.user+":"+cred.getpasswd( self.encPasswd ) )
    # curling sso.sample.com, which I assume promts a login dialog box and curl will set that with the varible provide above
    c.setopt(c.URL, 'https://sso.sample.com')
    # save the cookie to cookieDir
    c.setopt(c.COOKIEJAR, self.cookieDir)
    c.setopt(c.COOKIEFILE, self.cookieDir )
    c.setopt(c.WRITEFUNCTION, buffer.write)
    # perform all the previous curl commands
    c.perform()
    c.unsetopt(c.USERPWD)
    # curl new site sample.com
    c.setopt(c.URL, 'https://sample.com')
    c.perform()
    # save output as html var
    html = str(buffer.getvalue())    

----------------------------------------------------------
# The following three if statments
# if "some string is found" in varible-html: then do the lines indented lines that follow
if "RelayState" in html:
    # setup a regex to look for "input type="hidden" name="RelayState" value="[and captures everything here this will become the RELAY var]"
    rex = re.compile( "input type=\"hidden\" name=\"RelayState\" value=\"(.*)\"" )
    # this executes the regex expression on the html var
    RELAY = rex.search( html ).groups()[0]
if "SAMLResponse" in html:
    rex = re.compile( "input type=\"hidden\" name=\"SAMLResponse\" value=\"(.*)\"" )
    # same thing is happening here capturing the value as SAML
    SAML =  rex.search( html ).groups()[0]
    # contructing a new var with strings and the newly contructed vars
    datastuff = {'SAMLResponse':SAML,'RelayState':RELAY,'redirect':'Redirect','show_button':'true'}
if "form method=\"POST\" action=" in html:
    rex = re.compile( "form method=\"POST\" action=\"(.*)\" " )
    # again action="[postURL]"
    postUrl = rex.search( html ).groups()[0]
---------------------------------------------------------- 

#post our saml obtained, get to our final dest
    c.setopt(c.URL, postUrl ) # setup curl with url found above
    c.setopt(c.POST, True) # use post method
    c.setopt(c.POSTFIELDS, urlencode( datastuff )) # post fields found above with newly contructed vars
    c.perform()
    c.close()

If something changed and you are now getting an error, I would try print html after the html = str(buffer.getvalue()) to see if your still hitting the same page where it is expecting to find the regex's performed. 如果某些内容发生了变化并且您现在收到错误,我会尝试在html = str(buffer.getvalue())之后print html ,看看您是否仍然在寻找正在执行的正则表达式的同一页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM