简体   繁体   English

有没有办法从网站的源代码以编程方式运行 javascript function

[英]Is there a way to programmatically run a javascript function from the source of a website

On basketball-reference.com, there is an injury page that shows all of the current injuries in the NBA.在basketball-reference.com上,有一个伤病页面,显示了NBA目前的所有伤病情况。 I'd like to begin archiving this data to keep a record of whose injured in the NBA daily.我想开始存档这些数据,以记录每天在 NBA 受伤的人。 Apart from simply being a basketball stat nut, this is will be an input to a Bayesian Model that predicts a players playing time from his teammates injuries.除了简单地作为一个篮球统计螺母之外,这将是一个贝叶斯 Model 的输入,它可以预测队友受伤的球员上场时间。

Now, I could simply go to his page once a day, click the Get Table as a CSV" button , and copy and paste that into a file, but this seems like a cron job.现在,我可以每天一次简单地将 go 转到他的页面,单击Get Table as a CSV" button ,然后将其复制并粘贴到文件中,但这似乎是一项 cron 工作。

I could grab the raw html and parse it but the web page already has a get_csv_output(e) function in its sr-min.js file readily available.我可以抓取原始的sr-min.js并对其进行解析,但get_csv_output(e)页面已经在其文件中随时可用In fact, if I open up the developer console and type in事实上,如果我打开开发者控制台并输入

get_csv_output("injuries")

I get all of the csv dumped out as a string.我把所有的 csv 作为字符串倾倒出来。 It feels an awful lot like reinventing the wheel when I could simply use this function.当我可以简单地使用这个 function 时,感觉就像重新发明轮子一样可怕。

Somehow there is a disconnect in my mind though.不知何故,我的脑海里有一个脱节。 I don't grok how I can visit a page, run a js function, and save the output without spinning up a full chrome driver instance through selenium or something.我不知道如何访问页面,运行 js function,并保存 output 而无需通过 Z8E00596AD8DE2213FF8ZF8D8478D5362 启动完整的 chrome 驱动程序实例。 This feels like a simple problem with a simple solution that I just don't know.这感觉就像一个简单的问题,我只是不知道一个简单的解决方案。

I don't particularly care what language the solution is in, although I'd prefer a python, bash, or some other light weight solution.我并不特别关心解决方案使用什么语言,尽管我更喜欢 python、bash 或其他一些轻量级解决方案。

Please let me know if I'm being naive.请让我知道我是否天真。

Edit: The page is https://www.basketball-reference.com/friv/injuries.cgi编辑:页面是https://www.basketball-reference.com/friv/injuries.cgi

Edit 2: The accepted answer is an excellent solution for future reference.编辑 2:接受的答案是供将来参考的绝佳解决方案。

I ended up doing我最终做了

curl https://www.basketball-reference.com/friv/injuries.cgi | python3 convert_injury_html_to_csv.py > "$(date +'%Y%m%d')".tsv

Where the python script is... python 脚本在哪里...

import sys
from bs4 import BeautifulSoup


def parse_injury_html(html_doc):
    soup = BeautifulSoup(html_doc, "html.parser")
    injuries_table = soup.find(id="injuries")
    for row in injuries_table.tbody.find_all("tr"):
        if row.get('class', None) == "thead":
            continue
        name = row.th
        team, update, description = row.find_all("td")
        yield((name.string, team.string, update.string, description.string))


def main():
    for (name, team, update, description) in parse_injury_html(sys.stdin.read()):
        print(f"{name}\t{team}\t{update}\t{description}")


if __name__ == '__main__':
    main()

You could more directly just run the code in that JS function.您可以更直接地运行该 JS function 中的代码。 Node.js is a standalone JS engine, so you may be able to use it to run the exact same function. Node.js是一个独立的 JS 引擎,因此您可以使用它来运行完全相同的 function。

That function is most likely just making HTTP requests to download the data from a server, perhaps with some mild data manipulations. function 很可能只是发出 HTTP 请求从服务器下载数据,可能会进行一些温和的数据操作。 The networking layer between node and browser JS are not the same, but there are polyfills available. node 和浏览器 JS 之间的网络层并不相同,但有可用的 polyfills。 If the JS function is using the fetch API, you can use node-fetch , or if it's using XHR-style requests, xmlhttprequest .如果 JS function 正在使用 fetch API,则可以使用node-fetch ,或者如果它使用 XHR 样式的请求, xmlhttprequest

Since the code is probably a simple data fetch, it might be simple enough to reverse-engineer what's going on and write your own script yourself in whatever language you prefer to make the same type of HTTP request.由于代码可能是一个简单的数据获取,它可能很简单,可以对正在发生的事情进行逆向工程,并用您喜欢的任何语言编写自己的脚本来发出相同类型的 HTTP 请求。 Watching what's going on in the network tab of your developer tools should tell you where it's getting its data.观察开发人员工具的网络选项卡中发生的事情应该会告诉您它从哪里获取数据。

Just executing this function won't do no good because it must be executed in context of that injuries page.仅执行此 function 不会有任何好处,因为它必须在该伤害页面的上下文中执行。 If you look at its code, it effectively parses html data.如果您查看它的代码,它会有效地解析 html 数据。 Weird way of doing things but I saw worse.奇怪的做事方式,但我看到更糟。 Nevermind.没关系。

The easiest solution will be using something that opens the page and calls the function just like you do it in devtools.最简单的解决方案是使用打开页面并调用 function 的东西,就像在 devtools 中那样。 Barmar suggested Selenium, but I personally prefer puppeteer. Barmar建议 Selenium,但我个人更喜欢 puppeteer。 It is run via NodeJS, it opens Chrome in windowless mode and executes any open API on any site.它通过 NodeJS 运行,它以无窗口模式打开 Chrome,并在任何站点上执行任何打开的 API。 In our case - the get_csv_output function.在我们的例子中 - get_csv_output function。

After that you may do whatever you want with the result string.之后,您可以对结果字符串做任何您想做的事情。 Dump it to DB or save to file.将其转储到数据库或保存到文件。

An example of puppeteer code . puppeteer 代码示例

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法以编程方式确定 function 实际上是 Javascript 中的 object? - Is there a way to programmatically determine that a function is actually an object in Javascript? 使用Python运行网站的javascript函数 - Run a javascript function of a website using Python 有没有办法在 ASP.NET 中从 C# 运行 JavaScript 函数? - Is there a way to run a JavaScript function from C# in ASP.NET? 是否可以使用网站的网址在网站上运行javascript函数? - is it possible to run a javascript function on a website by using the url of the website? 有没有办法阻止JavaScript在我的网站上运行? - Is there a way to keep JavaScript from running on my website? 从网站/本地主机运行应用程序脚本功能 - Run app script function from website/localhost 是否有可能执行噩梦般从网站的javascript编写的功能? 还是其他方式? - Is it possible to execute a function that is written from the javascript of a website with nightmare? Or any other way? 以编程方式将 javascript 值从另一个 javascript function 更改为 ZC1C425268E68385D1AB5074C17A94F1 - Change javascript value in one function from another javascript function programmatically Javascript 从另一个网站的源代码中查找 html 元素的问题 - Javascript problem with finding html elements from source code of another website 有没有办法强制javascript函数首先运行 - Is there a way to force a javascript function to run first
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM