简体   繁体   English

如何将托管在 GitHub 上的 JavaScript 数组解析为 Python 列表?

[英]How can I parse a JavaScript array hosted on GitHub into a Python list?

I have a Discord bot written in Python and I wanted to add a feature that would make it immediately delete any phishing links it finds.我有一个用 Python 编写的 Discord 机器人,我想添加一个功能,让它立即删除它发现的任何网络钓鱼链接。

I looked for a list of known phishing domains and I found this on GitHub.我查找了已知网络钓鱼域的列表,并在 GitHub 上找到了这个列表。

However the issue is that this is a JS file with one big array, and my bot is 100% Python.但是问题是这是一个带有一个大数组的 JS 文件,而我的机器人是 100% Python。

I could just make a copy of this list, but then I lose the advantage of it being constantly updated, so I would like to read the domains directly from GitHub, if possible.我可以复制这个列表,但是我失去了它不断更新的优势,所以如果可能的话,我想直接从 GitHub 读取域。

I am not sure how to get and parse this into a Python list.我不确定如何获取并将其解析为 Python 列表。

Looking around on StackOverflow people are suggesting parsing the data as JSON, or using regex, but unfortunately I haven't understood it all yet.在 StackOverflow 上环顾四周,人们建议将数据解析为 JSON,或者使用正则表达式,但不幸的是我还没有完全理解。

Guidance would help - or maybe you have a better way of doing things altogether rather than this approach!指导会有所帮助——或者也许你有更好的做事方式,而不是这种方式! Thank you谢谢

Here is one approach (prone to failure and definitely not the recommended way to do this):这是一种方法(容易失败,绝对不是推荐的方法):

import requests

RAW_DATA_LINK = "https://raw.githubusercontent.com/nikolaischunk/discord-phishing-links/main/domain-list.js"


def get_data():
    response = requests.get(RAW_DATA_LINK)
    data = response.content.decode()
    data = data.replace("const suspiciousDomains = ", "").replace(";", "")  # or just data[26:-2]
    return eval(data)

get_data() will give you a list of all the links in that file. get_data()将为您提供该文件中所有链接的列表。 You could additionally try using sessions while making the request...您还可以在发出请求时尝试使用会话...

Again if you are in control of that file just store it as json and if you are not in control, you'd probably be better off with regular expressions.同样,如果您控制该文件,只需将其存储为 json 并且如果您无法控制,则使用正则表达式可能会更好。

Disclaimer: I was not able to see the original js file, so there might be some inaccuracy.免责声明:我无法看到原始的 js 文件,因此可能存在一些不准确之处。 This answer was written to provide an alternative from using eval() as it is a huge security risk .编写此答案是为了提供使用eval()的替代方法,因为它存在巨大的安全风险 Read Eval really is dangerous .Eval 真的很危险

I assume the Javascript file is something like this:我假设 Javascript 文件是这样的:

const suspiciousDomains = {
  "domains": [
    "tinyurl.com/yyw8sy9b",
    "tinyurl.com/yyyz9xdg",
    "token-bit.com"
  ]
};
import requests
import json  

RAW_DATA_LINK = "https://raw.githubusercontent.com/nikolaischunk/discord-phishing-links/main/domain-list.js" // the now dead link

def get_data():
    # credit to @Sujal Singh
    data = response.content.decode().replace("const suspiciousDomains = ", "").replace(
        ";", "")  # or just data[26:-2]
    # use json.loads() instead
    return json.loads(data) 

json.loads() does not evaluate the string directly but instead parse the string. json.loads()不直接评估字符串,而是解析字符串。
To see what json.loads() do, you can read this .要查看json.loads()做了什么,您可以阅读内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM