简体   繁体   中英

Dryscrape visit works only once in python

I want visit page in loop.

Code is:

import dryscrape

dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5';
loop = 1
while loop < 100000: 

    sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
    sess.set_attribute('auto_load_images', False)
    sess.set_timeout(30)
    sess.visit(url)
    response = sess.body()
    print(response)
    print('loop:', loop)
    sess.reset()
    loop = loop + 1 

According to output, page is visited only once time, I don't understand why? In 2., 3., .... there is no output:

('loop:', 1)
<!DOCTYPE html><html><head>
  <meta charset="utf-8">
  <title>Javascript scraping test</title>
</head>
<body>
  <p id="intro-text">Yay! Supports javascript</p>
  <script>
     document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
  </script> 

</body></html>
('loop:', 2)

('loop:', 3)

('loop:', 4)

('loop:', 5)

('loop:', 6)

('loop:', 7)

Can you help me? Thank you.

Same problem with me i solve this with def try this

def fb(user,pwd)
 import dryscrape as d
 d.start_xvfb()
 Br = d.Session()
 #every time it creat a new session
 Br.visit('http://fb.com')
 Br.at_xpath('//*[@name = "email"]').set(user)
 Br.at_xpath('//*[@name = "pass"]').set(pwd)
 Br.at_xpath('//*[@name = "login"]').click()
 #......Now Do Something you want.....#

Then after making def now use this

 fb('my@account.com','password')

Then automatic login yourself user this command 100 time without error

Please read and answers my Question Same name links cant click python dryscrape

After updating dryscrape and its dependencies to the latest version, it works fine now.

The versions are: dryscrape-1.0, lxml-4.1.1, webkit-server-1.0, xvfbwrapper-0.2.9

The code:

import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
loop = 1
while loop < 100000:

    sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
    sess.set_attribute('auto_load_images', False)
    sess.set_timeout(30)
    sess.visit(url)
    response = sess.body()
    print(response)
    print('loop:', loop)
    sess.reset()
    loop = loop + 1

Output:

   'loop:' 1
   <!DOCTYPE html><html><head>
     <meta charset="utf-8">
     <title>Javascript scraping test</title>
   </head>
   <body>
     <p id="intro-text">Yay! Supports javascript</p>
     <script>
        document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
     </script> 

   </body></html>
   'loop:' 2
   <!DOCTYPE html><html><head>
     <meta charset="utf-8">
     <title>Javascript scraping test</title>
   </head>
   <body>
     <p id="intro-text">Yay! Supports javascript</p>
     <script>
        document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
     </script> 

   </body></html>
   'loop:' 3
   <!DOCTYPE html><html><head>
     <meta charset="utf-8">
     <title>Javascript scraping test</title>
   </head>
   <body>
     <p id="intro-text">Yay! Supports javascript</p>
     <script>
        document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
     </script> 

   </body></html>

If you cant update the modules, or dont want to, a quick fix will be visiting another page at the end of the loop.

import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
otherurl = "http://192.168.1.5/test"
loop = 1
while loop < 100000:

    sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
    sess.set_attribute('auto_load_images', False)
    sess.set_timeout(30)
    sess.visit(url)
    response = sess.body()
    print(response)
    print('loop:', loop)
    sess.reset()
    loop = loop + 1
    sess.visit(otherurl) #Visits the other url, so that when sess.visit(url) is called, it is forced to visit the page again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM