简体   繁体   English

Python selenium 使用 execute_cdp_cmd 访问 chrome 开发工具 | 确定哪个 stylesheetId 属于哪个样式表

[英]Python selenium accessing chrome dev tools with execute_cdp_cmd | Determine which stylesheetId belongs to which stylesheet

I am using selenium with python to automatically remove unused CSS code from my website.我正在使用 selenium 和 python 从我的网站自动删除未使用的 CSS 代码。

I found a solution which seems to be good here:我在这里找到了一个似乎很好的解决方案:

https://chromedevtools.github.io/devtools-protocol/tot/CSS/ https://chromedevtools.github.io/devtools-protocol/tot/CSS/

I have tried now different ways to generate the ranges json with the code I use and this seems to be promising:我现在尝试了不同的方法来使用我使用的代码生成范围 json,这似乎很有希望:

browser.execute_cdp_cmd("CSS.enable", {})
browser.execute_cdp_cmd("CSS.startRuleUsageTracking", {})

sleep(1)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(1)

#snapshot = browser.execute_cdp_cmd("Profiler.takePreciseCoverage", {})
#snapshot = browser.execute_cdp_cmd("Profiler.stop", {})
#browser.execute_cdp_cmd("CSS.stopRuleUsageTracking", {})
snapshot = browser.execute_cdp_cmd("CSS.takeCoverageDelta", {}) # seems to work best as it is dynamic
print(snapshot)

with open('coverage_delta_css.json', 'w', encoding='utf-8') as _file:
    json.dump(snapshot, _file)

print(snapshot['coverage'][0]['styleSheetId']) # a test id
print(browser.execute_cdp_cmd("CSS.getStyleSheetText ", {"styleSheetId": snapshot['coverage'][0]['styleSheetId']})) # here it tells me the Id is unknown

Here is an example part of the generated json:这是生成的 json 的示例部分:

{
  "coverage": [
    {
      "endOffset": 335,
      "startOffset": 133,
      "styleSheetId": "16752.5",
      "used": true
    },
    {
      "endOffset": 1025,
      "startOffset": 471,
      "styleSheetId": "16752.7",
      "used": true
    },
    ...
  ]
}

The problem is that styleSheetId is a number and I can't find a way how to determine which stylesheet it refers to.问题是styleSheetId是一个数字,我找不到如何确定它所指的样式表的方法。 I have ( main.css and other.css ).我有( main.cssother.css )。 I want to remove unused CSS only in other CSS.我只想在其他 CSS 中删除未使用的 CSS。

Also in the example above I try to get the RAW text of the stylesheet with the id in the JSON, but it seems that the id changes with every call and is unknown.同样在上面的示例中,我尝试使用 JSON 中的 id 获取样式表的 RAW 文本,但似乎 id 每次调用都会更改并且是未知的。

selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32601,"message":"'CSS.getStyleSheetText ' wasn't found"}

I feel like I am close to the solution.我觉得我已经接近解决方案了。 Hope someone can help on the last steps.希望有人可以帮助完成最后的步骤。

You have an extra space in "CSS.getStyleSheetText " .您在"CSS.getStyleSheetText "中有一个额外的空间。 The result will be a dictionary with key "text" :结果将是一个带有键"text"的字典:

from selenium import webdriver


options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
try:
    driver.get('https://www.google.com')

    driver.execute_cdp_cmd("CSS.enable", {})
    driver.execute_cdp_cmd("CSS.startRuleUsageTracking", {})
    snapshot = driver.execute_cdp_cmd("CSS.takeCoverageDelta", {})
    coverage = snapshot['coverage']
    n_sheets = len(coverage)
    print(n_sheets)
    # print last one:
    id = coverage[-1]['styleSheetId']
    print(driver.execute_cdp_cmd("CSS.getStyleSheetText", {"styleSheetId": id})['text'])
finally:
    driver.quit()

Prints:印刷:

108
.UUbT9{position:absolute;width:100%;text-align:left;margin-top:-1px;z-index:3;cursor:default;-webkit-user-select:none}.aajZCb{background:#fff;box-shadow:0 4px 6px rgba(32,33,36,.28);display:flex;flex-direction:column;list-style-type:none;margin:0;padding:0;border:0;border-radius:0 0 24px 24px;padding-bottom:4px;overflow:hidden}.minidiv .aajZCb{border-bottom-left-radius:16px;border-bottom-right-radius:16px}.erkvQe{flex:auto;padding-bottom:8px}.RjPuVb{height:1px;margin:0 26px 0 0}.S3nFnd{display:flex}.S3nFnd .RjPuVb,.S3nFnd .aajZCb{flex:0 0 auto}.lh87ke:link,.lh87ke:visited{color:#36c;cursor:pointer;font:11px arial,sans-serif;padding:0 5px;margin-top:-10px;text-decoration:none;flex:auto;align-self:flex-end;margin:0 16px 5px 0}.lh87ke:hover{text-decoration:underline}.xtSCL{border-top:1px solid #e8eaed;margin:0 20px 0 14px;padding-bottom:4px}.sb7{background:url() no-repeat ;min-height:0px;min-width:0px;height:0px;width:0px}.sb27{background:url(/images/searchbox/desktop_searchbox_sprites318_hr.webp) no-repeat 0 -21px;background-size:20px;min-height:20px;min-width:20px;height:20px;width:20px}.sb43{background:url(/images/searchbox/desktop_searchbox_sprites318_hr.webp) no-repeat 0 0;background-size:20px;min-height:20px;min-width:20px;height:20px;width:20px}.sb53.sb53{padding:0 4px;margin:0}.sb33{background:url(/images/searchbox/desktop_searchbox_sprites318_hr.webp) no-repeat 0 -42px;background-size:20px;height:20px;width:20px}

But this just gives you the text.但这只是给你文字。 It's not clear how you would trace this back to a CSS file.目前尚不清楚如何将其追溯到 CSS 文件。 I also ran this against one website that returned a snapshot with a list of 9 styleSheetId values that were all the same (the HTML specified a single CSS stylesheet).我还在一个网站上运行了这个,该网站返回了一个包含 9 styleSheetId值列表的快照,这些值都是相同的(HTML 指定了一个 CSS 样式表)。

Why not just parse the HTML source looking for external stylesheet links as follows:为什么不只解析 HTML 源代码来寻找外部样式表链接,如下所示:

from selenium import webdriver
from bs4 import BeautifulSoup


options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
try:
    driver.get('https://www.yahoo.com')
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    css_files =  [link["href"] for link in soup.findAll("link") if "stylesheet" in link.get("rel", [])]
    print(css_files)
finally:
    driver.quit()

Prints:印刷:

['https://s.yimg.com/nn/lib/metro/g/myy/grid_0.0.39.css', 'https://s.yimg.com/nn/lib/metro/g/myy/video_styles_0.0.72.css', 'https://s.yimg.com/nn/lib/metro/g/myy/font_yahoosans_0.0.45.css', 'https://s.yimg.com/nn/lib/metro/g/myy/wafertooltip_0.0.15.css', 'https://s.yimg.com/nn/lib/metro/g/sda/sda_flex_0.0.43.css', 'https://s.yimg.com/nn/lib/metro/g/sda/sda_adlite_0.0.7.css', 'https://s.yimg.com/os/yc/css/bundle.c60a6d54.css', 'https://s.yimg.com/aaq/fp/css/tdv2-applet-native-ads.PencilAd.atomic.ltr.4486c5cd56279289e1537fa63007fc45.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-featurebar.FeaturebarNew.atomic.ltr.43aa16e888a4e6e22b1273bcd144ec13.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-ntk.NTKGrid.atomic.ltr.7ae95e008cea5ca8c068d5e54332ac45.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-ntk.custom_grid.desktop.0e2848ba5290686273ddd6bdd2b6de63.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.StreamGrid.atomic.ltr.2bdffca67e538fcc3d9e3d2b82e9fafa.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.custom.desktop.35b4e59342f8c72801c502afb5933cff.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.custom_grid.desktop.4ac7e62f7d11f0c628c4aa3fae7a8123.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-user-intent.rollupDesktop.atomic.ltr.e7f97823ea12a8bcef9fee986f8e851c.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-hpsetpromo.HpSetPromo.atomic.ltr.ceb4bec833ee8522db3f8a70f17355fd.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-trending.Trending.atomic.ltr.720d5fde89dba7a904a549124d90eaf9.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-weather.WeatherPreview.atomic.ltr.39ec8a7197b2e854eee5eb76559ec7a7.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-weather.common.desktop.62d099be776ca538092fa6ba87d1637b.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-scores.Scores.atomic.ltr.8c0e78d3aa079ff5130e0c619459ceb7.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-horoscope.HoroscopeGrid.atomic.ltr.3c743dd98289534ad1c07c777eb26bfb.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-subscription.SubscriptionGemini.atomic.ltr.7195e577ca1efda06c4b6857ded4b121.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-footer.FooterDesktop.atomic.ltr.47a5bd70d90a008f7b6a867d2fee9ab2.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-hpsetpromo.HpSetBannerPromo.atomic.ltr.030d2a4c4521d5f72e1051e79290b8ea.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-header.HeaderYBar.atomic.ltr.a67b5276a2eb6b9bff5bb0c370dd5c32.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-header.ybar.desktop.a5ef55315256ad2c3ff918a06f48f42e.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.StreamRelated.atomic.ltr.9cc9afaf9464d66e96bdf361af28f069.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-user-dialog.UserDialogLite.atomic.ltr.26606a64b43c7b47d521ea69b3ba11d5.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-subscription.SubscriptionReminder.atomic.ltr.46374553adf3056a1dac33e7fd69d273.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-subscription.custom.desktop.58c3fd7871df14d8f7f937fe038bcf17.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-user-intent.ContentPreference.atomic.ltr.eff5a3fd68eba42b5cbab57992febcaa.min.css', 'https://s.yimg.com/aaq/scp/css/viewer.bbd65011fa714bc6a4c74ebbfb906d06.css', 'https://s.yimg.com/aaq/c/e43d43c.caas-hpgrid.min.css', 'https://assets.video.yahoo.net/builds/a064591d7b/vdms-video-player.css']

Or if you are not using AJAX to dynamically modify the DOM to add additional stylesheets after the page is initially loaded, then just use requests :或者,如果您不使用 AJAX 在页面初始加载后动态修改 DOM 以添加其他样式表,则只需使用requests

import requests
from bs4 import BeautifulSoup


r = requests.get('https://www.yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
css_files =  [link["href"] for link in soup.findAll("link") if "stylesheet" in link.get("rel", [])]
print(css_files)

The returned list of URLs can be processed if you are only interested, for example, in relative URLs.如果您只对相对 URL 感兴趣,则可以处理返回的 URL 列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Selenium:使用 execute_cdp_cmd() 捕获 Chrome 开发工具网络请求/响应日志 - Python Selenium : Capture Chrome Dev Tools Network Request/Response Logs using execute_cdp_cmd() Python Selenium `execute_cdp_cmd` 仅在第一次运行时有效 - Python Selenium `execute_cdp_cmd` only works at the first run AttributeError: 'WebDriver' object 没有属性 'execute_cdp_cmd' - AttributeError: 'WebDriver' object has no attribute 'execute_cdp_cmd' 有没有办法通过 Python 加载网页的网络活动(您可以在 Chrome Dev Tools 上看到)? - Is there a way to get a webpage's Network activity (which you can see on Chrome Dev Tools) on load via Python? Selenium(Python、Chrome)如何找到在开发工具中可见但在页面源代码中不可见的 web 元素? - How can Selenium (Python, Chrome) find web elements visible in dev tools, but not visible in page source? VS Code Python “justMyCode”调试器功能如何确定哪些代码属于用户? - How does the VS Code Python "justMyCode" debugger feature determine which code belongs to the user? 在python中调用属于某个类的函数 - Calling a function which belongs to a class in python Python:检查值属于哪个bin - Python: Checking to which bin a value belongs 如何让用户属于python中的哪个组 - How to get the user belongs to which group in python 在 Python 字典中查找值所属的键 - Find the key to which a value belongs in a Python dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM