简体   繁体   中英

How to categorize websites?

I have a list of URLs scraped from the google search. I want to segregate the websites into categories like company/business, blog, news, sports.

Searching and scraping google search is done using Python script.

I am not getting how to segregate the URLs. Can anyone help me with this?

Writing your own program for categorizing websites will not be easy. You may need to develop an AI-based system which will visit every site to scrape necessary data and based on the data and keywords scraped, determine which type of site that is. It's my idea, there may have a better approach to do this.

Rather you should use third-party websites. There are many paid and free website category details provider. For categorizing websites check out these resources: SimilarWeb , Webshrinker , Symantec , cyren . Hope these will help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM