简体   繁体   English

如何检测正在使用的电子商务软件

[英]How to detect which ecommerce software is being used

I am making an webcrawler in C# which needs to find webshops.我正在用 C# 制作一个需要查找网上商店的网络爬虫。 The problem i'm having is that I need to detect if the webpage is a webshop.我遇到的问题是我需要检测该网页是否为网店。 If it is I need to find out what type of e-commerse software it is using.如果是,我需要找出它使用的是什么类型的电子商务软件。 But the problem is that I don't know how you can detect it in the source code.但问题是我不知道你如何在源代码中检测到它。

I also have just a Chrome plugin called builtwith which can detect all kinds of software.我还有一个名为 builtwith 的 Chrome 插件,它可以检测各种软件。 But I have yet to find out how they are doing that.但我还没有弄清楚他们是如何做到这一点的。

It would be nice if someone could help me with this problem如果有人能帮我解决这个问题就好了

Before giving you an actual answer, it's worth noting that what you're proposing could be in violation of the terms of use for many websites out there.在给你一个实际的答案之前,值得注意的是,你的提议可能违反了许多网站的使用条款。 You should take the time to investigate what legal liability you might be exposing yourself and your organization to.您应该花时间调查您自己和您的组织可能面临的法律责任。

This is going to be a lot of time consuming work, but it's not difficult.这将是大量耗时的工作,但并不难。 Your crawler is just going to need to simply work using a rules-based approach to detect signatures in the payload of the page.您的爬虫只需要简单地使用基于规则的方法来检测页面负载中的签名即可。

  1. Find the specific software that you're intending to detect.找到您要检测的特定软件。
  2. Find 2-3 sites that are definitely using the software.找到 2-3 个肯定在使用该软件的站点。
  3. Review the HTML payload to see what scripts, CSS, and HTML patterns they have that are common across the sites.查看 HTML 负载以查看它们具有哪些在站点中通用的脚本、CSS 和 HTML 模式。
  4. Build a code-based rule that can detect those patterns consistently.构建一个基于代码的规则,可以一致地检测这些模式。 For example: if (html.Contains("widgetName")) isPlatformName = true;例如: if (html.Contains("widgetName")) isPlatformName = true;
  5. Test that patterns across more sites that you know for certain are using that software.在您确定使用该软件的更多站点上测试该模式。
  6. Repeat for each software vendor.对每个软件供应商重复。

The more complicated thing will be when the targets have multiple versions and you need to adapt your rules to know and be aware of the various versions, or when platforms are very similar.更复杂的情况是当目标有多个版本并且您需要调整规则以了解和了解各种版本时,或者当平台非常相似时。

I think the most complicated part of this is having a well-thought-out bot issue detection, reporting, and throttling architecture in place.我认为其中最复杂的部分是经过深思熟虑的机器人问题检测、报告和限制架构。 You should probably spend the bulk of your time planning that.您可能应该将大部分时间花在计划上。

That's it.就是这样。

There are a couple different ways to determine the technologies a site is using.有几种不同的方法可以确定站点使用的技术。 Firstly, if you are technically savvy, you can right click on an eCommerce page (either catalog, checkout page, etc) and look at the source code.首先,如果您精通技术,可以右键单击电子商务页面(目录、结帐页面等)并查看源代码。 Many platforms will have hints in the source code that will give you an idea what the site is running.许多平台都会在源代码中提供提示,让您了解站点正在运行的内容。

You can also look at the DNS/hosting information, which would help you determine if the eCommerce solution is hosted or SaaS (like Shopify, for example).您还可以查看 DNS/托管信息,这将帮助您确定电子商务解决方案是托管的还是 SaaS(例如 Shopify)。

You can also try using InterNIC and enter the domain name.您也可以尝试使用InterNIC并输入域名。 The results will return the nameservers which could point you in the right direction.结果将返回可以为您指明正确方向的名称服务器。

Finally, if all that sleuthing seems too difficult, there's an easier way!最后,如果所有的侦查看起来都太难了,还有一个更简单的方法! Try BuiltWith.尝试内置。 It's generally pretty reliable, as long as the system you're looking up isn't custom/proprietary.只要您查找的系统不是自定义/专有的,它通常非常可靠。 Enter a domain into BuiltWith and it will show you the platform, widgets used, analytics and tracking codes, CDNs, CMS, payment processors, and more.BuiltWith 中输入一个域,它将显示平台、使用的小部件、分析和跟踪代码、CDN、CMS、支付处理器等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM