I want to crawl google images in google colab to train a tf model.
doc = BeautifulSoup(requests.get("https://www.google.com/search?q=dog&tbm=isch").text, "html.parser")
all_imgs = [[image.load_img(tf.keras.utils.get_file("images",e.attrs["src"]),target_size=[90,90]),e.attrs["src"][-9:]] for e in doc.select("img")[1:]]
for e in all_imgs:
plt.figure()
plt.imshow(e[0])
plt.title(e[1])
plt.show()
Explanation:
doc is the parsed html code
all_imgs is a list with the following format [[img,end_of_img_link],[img,end_of_img_link],...]
The problem is that the output is the same image over and over again.
Even if I change the url to crawl imgs of cats like search?q=cat
it still shows the same image of a dog!
What is the problem?
EDIT: I figured out that the list consists of many copies of the same image, so the problem is the fault of BeautifulSoup not matplotlib
I found the solution:
All the pictures had the name "images".
Because of that the first loaded picture did not get overwritten. I had to do: .rm -rf /root/.keras/datasets/
in the notebook to delete the folder with the saved image.
Now I am going to name the saved picture differently.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.