简体   繁体   English

如何从 javascript 文件中提取 javascript function

[英]How to to extract a javascript function from a javascript file

I need to extract an entire javascript function from a script file.我需要从脚本文件中提取整个 javascript function 。 I know the name of the function, but I don't know what the contents of the function may be.我知道function的名字,但不知道function的内容可能是什么。 This function may be embedded within any number of closures.这个 function 可以嵌入到任意数量的闭包中。

I need to have two output values:我需要有两个 output 值:

  1. The entire body of the named function that I'm finding in the input script.我在输入脚本中找到的名为 function 的整个正文。
  2. The full input script with the found named function removed.删除了名为 function 的完整输入脚本。

So, assume I'm looking for the findMe function in this input script:所以,假设我在这个输入脚本中寻找findMe function:

function() {
  function something(x,y) {
    if (x == true) {
      console.log ("Something says X is true");
      // The regex should not find this:
      console.log ("function findMe(z) { var a; }");
    }
  }
  function findMe(z) {
    if (z == true) {
      console.log ("Something says Z is true");
    }
  }
  findMe(true);
  something(false,"hello");
}();

From this, I need the following two result values:由此,我需要以下两个结果值:

  1. The extracted findMe script提取的findMe脚本

    function findMe(z) { if (z == true) { console.log ("Something says Z is true"); } }
  2. The input script with the findMe function removed删除了findMe function 的输入脚本

    function() { function something(x,y) { if (x == true) { console.log ("Something says X is true"); // The regex should not find this: console.log ("function findMe(z) { var a; }"); } } findMe(true); something(false,"hello"); }();

The problems I'm dealing with:我正在处理的问题:

  1. The body of the script to find could have any valid javascript code within it.要查找的脚本正文中可能包含任何有效的 javascript 代码。 The code or regex to find this script must be able to ignore values in strings, multiple nested block levels, and so forth.查找此脚本的代码或正则表达式必须能够忽略字符串、多个嵌套块级别等中的值。

  2. If the function definition to find is specified inside of a string, it should be ignored.如果在字符串中指定要查找的 function 定义,则应忽略它。

Any advice on how to accomplish something like this?关于如何完成这样的事情的任何建议?

Update:更新:

It looks like regex is not the right way to do this.看起来正则表达式不是这样做的正确方法。 I'm open to pointers to parsers that could help me accomplish this.我愿意接受指向可以帮助我完成此任务的解析器的指针。 I'm looking at Jison , but would love to hear about anything else.我在看Jison ,但很想听听其他的。

If the script is included in your page (something you weren't clear about) and the function is publicly accessible, then you can just get the source to the function with:如果脚本包含在您的页面中(您不清楚的内容)并且 function 可以公开访问,那么您可以通过以下方式获取 function 的源代码:

functionXX.toString();

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Function/toString https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Function/toString

Other ideas:其他想法:

1) Look at the open source code that does either JS minification or JS pretty indent. 1) 查看执行 JS 缩小或 JS 缩进的开源代码。 In both cases, those pieces of code have to "understand" the JS language in order to do their work in a fault tolerant way.在这两种情况下,这些代码都必须“理解” JS 语言才能以容错的方式完成它们的工作。 I doubt it's going to be pure regex as the language is just a bit more complicated than that.我怀疑它会是纯正则表达式,因为语言比这复杂一点。

2) If you control the source at the server and are wanted to modify a particular function in it, then just insert some new JS that replaces that function at runtime with your own function. 2)如果您在服务器上控制源并且想要在其中修改特定的 function,那么只需插入一些新的 JS,在运行时将 function 替换为您自己的 ZC1C425268E681785D1AB4Z5074C。 That way, you let the JS compiler identify the function for you and you just replace it with your own version.这样,您就可以让 JS 编译器为您识别 function,然后将其替换为您自己的版本。

3) For regex, here's what I've done which is not foolproof, but worked for me for some build tools I use: 3)对于正则表达式,这是我所做的并不是万无一失的,但对我使用的一些构建工具有用:

I run multiple passes (using regex in python):我运行多遍(在python中使用正则表达式):

  1. Remove all comments delineated with /* and */.删除所有用 /* 和 */ 描述的注释。
  2. Remove all quoted strings删除所有引用的字符串
  3. Now, all that's left is non-string, non-comment javascript so you should be able to regex directly on your function declaration现在,剩下的就是非字符串、非注释 javascript 所以你应该能够直接在你的 function 声明上进行正则表达式
  4. If you need the function source with strings and comments back in, you'll have to reconstitute that from the original, now that you know the begin end of the function如果您需要带有字符串和注释的 function 源代码,您必须从原始代码中重新构建它,因为您知道 function 的开头结尾

Here are the regexes I use (expressed in python's multi-line format):以下是我使用的正则表达式(以 python 的多行格式表示):

reStr = r"""
    (                               # capture the non-comment portion
        "(?:\\.|[^"\\])*"           # capture double quoted strings
        |
        '(?:\\.|[^'\\])*'           # capture single quoted strings
        |
        (?:[^/\n"']|/[^/*\n"'])+    # any code besides newlines or string literals
        |
        \n                          # newline
    )
    |
    (/\*  (?:[^*]|\*[^/])*   \*/)       # /* comment */
    |
    (?://(.*)$)                     # // single line comment
    $"""    

reMultiStart = r"""         # start of a multiline comment that doesn't terminate on this line
    (
        /\*                 # /* 
        (
            [^\*]           # any character that is not a *
            |               # or
            \*[^/]          # * followed by something that is not a /
        )*                  # any number of these
    )
    $"""

reMultiEnd = r"""           # end of a multiline comment that didn't start on this line
    (
        ^                   # start of the line
        (
            [^\*]           # any character that is not a *
            |               # or
            \*+[^/]         # * followed by something that is not a /
        )*                  # any number of these
        \*/                 # followed by a */
    )
"""

regExSingleKeep = re.compile("// /")                    # lines that have single lines comments that start with "// /" are single line comments we should keep
regExMain = re.compile(reStr, re.VERBOSE)
regExMultiStart = re.compile(reMultiStart, re.VERBOSE)
regExMultiEnd = re.compile(reMultiEnd, re.VERBOSE)

This all sounds messy to me.这对我来说听起来很乱。 You might be better off explaining what problem you're really trying to solve so folks can help find a more elegant solution to the real problem.你最好解释一下你真正想要解决的问题,这样人们就可以帮助找到一个更优雅的解决实际问题的方法。

I built a solution in C# using plain old string methods (no regex) and it works for me with nested functions as well.我使用普通的旧字符串方法(无正则表达式)在 C# 中构建了一个解决方案,它也适用于嵌套函数。 The underlying principle is in counting braces and checking for unbalanced closing braces.基本原理是计算大括号并检查不平衡的右大括号。 Caveat: This won't work for cases where braces are part of a comment but you can easily enhance this solution by first stripping out comments from the code before parsing function boundaries.警告:这不适用于大括号是注释一部分的情况,但您可以通过在解析 function 边界之前首先从代码中删除注释来轻松增强此解决方案。

I first added this extension method to extract all indices of matches in a string (Source: More efficient way to get all indexes of a character in a string )我首先添加了这个扩展方法来提取字符串中匹配的所有索引(来源: 更有效的方法来获取字符串中字符的所有索引

    /// <summary>
    /// Source: https://stackoverflow.com/questions/12765819/more-efficient-way-to-get-all-indexes-of-a-character-in-a-string
    /// </summary>
    public static List<int> AllIndexesOf(this string str, string value)
    {
        if (String.IsNullOrEmpty(value))
            throw new ArgumentException("the string to find may not be empty", "value");
        List<int> indexes = new List<int>();
        for (int index = 0; ; index += value.Length)
        {
            index = str.IndexOf(value, index);
            if (index == -1)
                return indexes;
            indexes.Add(index);
        }
    }

I defined this struct for easy referencing of function boundaries:我定义了这个结构以便于引用 function 边界:

    private struct FuncLimits
    {
        public int StartIndex;
        public int EndIndex;
    }

Here's the main function where I parse the boundaries:这是我解析边界的主要 function :

    public void Parse(string file)
    {
        List<FuncLimits> funcLimits = new List<FuncLimits>();

        List<int> allFuncIndices = file.AllIndexesOf("function ");
        List<int> allOpeningBraceIndices = file.AllIndexesOf("{");
        List<int> allClosingBraceIndices = file.AllIndexesOf("}");

        for (int i = 0; i < allFuncIndices.Count; i++)
        {
            int thisIndex = allFuncIndices[i];
            bool functionBoundaryFound = false;

            int testFuncIndex = i;
            int lastIndex = file.Length - 1;

            while (!functionBoundaryFound)
            {
                //find the next function index or last position if this is the last function definition
                int nextIndex = (testFuncIndex < (allFuncIndices.Count - 1)) ? allFuncIndices[testFuncIndex + 1] : lastIndex;

                var q1 = from c in allOpeningBraceIndices where c > thisIndex && c <= nextIndex select c;
                var qTemp = q1.Skip<int>(1); //skip the first element as it is the opening brace for this function

                var q2 = from c in allClosingBraceIndices where c > thisIndex && c <= nextIndex select c;

                int q1Count = qTemp.Count<int>();
                int q2Count = q2.Count<int>();

                if (q1Count == q2Count && nextIndex < lastIndex)
                    functionBoundaryFound = false; //next function is a nested function, move on to the one after this
                else if (q2Count > q1Count)
                {
                    //we found the function boundary... just need to find the closest unbalanced closing brace 
                    FuncLimits funcLim = new FuncLimits();
                    funcLim.StartIndex = q1.ElementAt<int>(0);
                    funcLim.EndIndex = q2.ElementAt<int>(q1Count);
                    funcLimits.Add(funcLim);

                    functionBoundaryFound = true;
                }
                testFuncIndex++;
            }
        }
    }

I am almost afraid that regex cannot do this job.我几乎担心正则表达式无法完成这项工作。 I think it is the same as trying to parse XML or HTML with regex, a topic that has already caused various religious debates on this forum.我认为这与尝试使用正则表达式解析 XML 或 HTML 相同,这个话题已经在这个论坛上引起了各种宗教辩论。

EDIT: Please correct me if this is NOT the same as trying to parse XML.编辑:如果这与尝试解析 XML 不同,请纠正我。

A regex can't do this.正则表达式不能做到这一点。 What you need is a tool that parses JavaScript in a compiler-accurate way, builds up a structure representing the shape of the JavaScript code, enables you to find the function you want and print it out, and enables you to remove the function definition from that structure and regenerate the remaining javascript text. What you need is a tool that parses JavaScript in a compiler-accurate way, builds up a structure representing the shape of the JavaScript code, enables you to find the function you want and print it out, and enables you to remove the function definition from该结构并重新生成剩余的 javascript 文本。

Our DMS Software Reengineering Toolkit can do this, using its JavaScript front end .我们的DMS 软件再造工具包可以使用其JavaScript 前端来做到这一点。 DMS provides general parsing, abstract syntax tree building/navigating/manipulation, and prettyprinting of (valid.) source text from a modified AST. DMS 提供通用解析、抽象语法树构建/导航/操作以及来自修改后的 AST 的(有效)源文本的漂亮打印。 The JavaScript front end provides DMS with compiler-accurate definition of JavaScript, You can point DMS/JavaScript at a JavaScript file (or even various kinds of dynamic HTML with embedded script tags containing JavaScript). The JavaScript front end provides DMS with compiler-accurate definition of JavaScript, You can point DMS/JavaScript at a JavaScript file (or even various kinds of dynamic HTML with embedded script tags containing JavaScript). have it produce the AST: A DMS pattern can be used to find your function:让它产生 AST:DMS 模式可用于查找您的功能:

  pattern find_my_function(r:type,a: arguments, b:body): declaration
     " \r my_function_name(\a) { \b } ";

DMS can search the AST for a matching tree with the specified structure; DMS 可以在 AST 中搜索具有指定结构的匹配树; because this is an AST match and not a string match, line breaks, whitespace, comments and other trivial differences won't fool it.因为这是一个 AST 匹配而不是字符串匹配,所以换行符、空格、注释和其他微不足道的差异不会欺骗它。 [What you didn't say is what to if you have more than one function in different scopes: which one do you want?] [你没有说的是如果你有多个不同范围的function怎么办:你想要哪一个?]

Having found the match, you can ask DMS to print just that matched code which acts as your extraction step.找到匹配项后,您可以要求 DMS打印匹配的代码作为提取步骤。 You can also ask DMS to remove the function using a rewrite rule:您还可以要求 DMS 使用重写规则删除 function:

  rule remove_my_function((r:type,a: arguments, b:body): declaration->declaration
     " \r my_function_name(\a) { \b } " -> ";";

and then prettyprint the resulting AST.然后漂亮打印生成的 AST。 DMS will preserve all the comments properly. DMS 将正确保留所有评论。

I guess you would have to use and construct a String-Tokenizer for this job.我想你必须为这项工作使用和构造一个 String-Tokenizer。

function tokenizer(str){
  var stack = array(); // stack of opening-tokens
  var last = ""; // last opening-token

  // token pairs: subblocks, strings, regex
  var matches = {
    "}":"{",
    "'":"'",
    '"':'"',
    "/":"/"
  };

  // start with function declaration
  var needle = str.match(/function[ ]+findme\([^\)]*\)[^\{]*\{/);

  // move everything before needle to result
  var result += str.slice(0,str.indexOf(needle));
  // everithing after needle goes to the stream that will be parsed
  var stream = str.slice(str.indexOf(needle)+needle.length);

  // init stack
  stack.push("{");
  last = "{";

  // while still in this function
  while(stack.length > 0){

    // determine next token
    needle = stream.match(/(?:\{|\}|"|'|\/|\\)/); 

    if(needle == "\\"){
      // if this is an escape character => remove escaped character
      stream = stream.slice(stream.indexOf(needle)+2);
      continue;

    }else if(last == matches[needle]){
      // if this ends something pop stack and set last
      stack.pop();
      last = stack[stack.length-1];

    }else if(last == "{"){  
      // if we are not inside a string (last either " or ' or /)
      // push needle to stack
      stack.push(needle);
      last = needle;
    }

    // cut away including token
    stream = stream.slice(stream.indexOf(needle)+1);
  }

  return result + stream;
}

oh, I forgot tokens for comments... but i guess you got an idea now of how it works...哦,我忘记了评论的标记......但我想你现在知道它是如何工作的......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM