简体   繁体   English

将一种语言翻译成另一种语言的一般方法是什么?

[英]What is a general approach for transpiling one language to another?

I would like to transpile JavaScript into LinkScript.我想将 JavaScript 转换为 LinkScript。 I have started like this:我是这样开始的:

const acorn = require('acorn')
const fs = require('fs')

const input = fs.readFileSync('./tmp/parse.in.js', 'utf-8')

const jst = acorn.parse(input, {
  ecmaVersion: 2021,
  sourceType: 'module'
})

fs.writeFileSync('tmp/parse.out.js.json', JSON.stringify(jst, null, 2))

const linkScriptText = generateLinkScriptText(convertToLinkScriptAst(jst))

fs.writeFileSync('tmp/parse.out.link', linkScriptText)

function convertToLinkScriptAst(jst) {
  const lst = {}
  switch (jst.type) {
    case 'Program':
      convertProgram(jst, lst)
      break
  }
  return lst
}

function convertProgram(jst, lst) {
  lst.zones = []
  jst.body.forEach(node => {
    switch (node.type) {
      case 'VariableDeclaration':
        convertVariableDeclaration(node).forEach(vnode => {
          lst.zones.push(vnode)
        })
        break
      case 'ExpressionStatement':

        break
      default: throw JSON.stringify(node)
    }
  })
}

function convertVariableDeclaration(jst) {
  return jst.declarations.map(dec => {
    switch (dec.type) {
      case 'VariableDeclarator':
        return convertVariableDeclarator(jst.kind, dec)
        break
      default: throw JSON.stringify(dec)
    }
  })
}

function convertVariableDeclarator(kind, jst) {
  return {
    type: 'host',
    immutable: kind === 'const',
    name: jst.id.name,
    value: convertVariableValue(jst.init)
  }
}

function convertVariableValue(jst) {
  if (!jst) return

  switch (jst.type) {
    case 'Literal':
      return convertLiteral(jst)
      break
  }
}

function convertLiteral(jst) {
  switch (typeof jst.value) {
    case 'string':
      return {
        type: 'string',
        value: jst.value
      }
    case 'number':
      return {
        type: 'number',
        value: jst.value
      }
    default: throw JSON.stringify(jst)
  }
}

function generateLinkScriptText(lst) {
  const text = []
  lst.zones.forEach(zone => {
    switch (zone.type) {
      case 'host':
        generateHost(zone).forEach(line => {
          text.push(line)
        })
        break
    }
  })
  return text.join('\n')
}

function generateHost(lst) {
  const text = []
  if (lst.value) {
    switch (lst.value.type) {
      case 'string':
        text.push(`host ${lst.name}, text <${lst.value.value}>`)
        break
      case 'number':
        text.push(`host ${lst.name}, size ${lst.value.value}`)
        break
    }
  } else {
    text.push(`host ${lst.name}`)
  }
  return text
}

Basically, you parse the JS into an AST, then convert this AST somehow into the AST of the target language (LinkScript in this case).基本上,您将 JS 解析为 AST,然后以某种方式将此 AST 转换为目标语言的 AST(在本例中为 LinkScript)。 Then convert the output AST into text.然后将输出的 AST 转换为文本。 The question is, what is a general strategy for doing this?问题是,这样做的一般策略是什么? It seems quite hard.好像挺难的。

In more detail, I need to know all the types of structures that you can create in JavaScript, and all the types of structures you can create in LinkScript, and how one maps to another.更详细地说,我需要知道您可以在 JavaScript 中创建的所有结构类型,以及您可以在 LinkScript 中创建的所有结构类型,以及如何映射到另一个。 In my head, looking at JS I can manually figure out how the corresponding LinkScript should look.在我的脑海中,看着 JS,我可以手动弄清楚相应的 LinkScript 应该是什么样子。 But it's a different story trying to programmatically do it, and I am a bit lost on the general approach I should be taking to do this.但是,尝试以编程方式执行此操作是另一回事,而且我对应该采用的一般方法感到有些迷茫。

First of all, even though I have been doing JavaScript for over 10 years, I don't know the JS AST that well.首先,尽管我已经做了 10 多年的 JavaScript,但我对 JS AST 不是很了解。 I am planning on writing some example snippets of code and seeing how the AST looks using acorn .我计划编写一些示例代码片段,并使用acorn查看 AST 的外观。 Second, it seems like there are so many combinations of things it is overwhelming.其次,似乎有太多的组合让人难以抗拒。

Do I just keep going down this road I've started on above?我是否继续沿着我在上面开始的这条路走下去? Or is there a more structured or disciplined approach?或者是否有更结构化或纪律性更强的方法? How do I better break the problem down into more manageable chunks?我如何更好地将问题分解为更易于管理的块?

Also, it is not always as easy as doing a simple one-to-one mapping.此外,它并不总是像进行简单的一对一映射那样容易。 Sometimes the order of things change.有时事情的顺序会改变。 For example, in JS you might have:例如,在 JS 中你可能有:

a = x + y

But in LinkScript, that would be:但在 LinkScript 中,这将是:

call add
  bind a, link x
  bind b, link y
  save a

So the assignment expression is sort of reversed.所以赋值表达式有点颠倒。 It gets more complicated in other cases.在其他情况下它会变得更加复杂。

So it's as if I need to study each individual type of mapping, and come up with a detailed plan or algorithm on how to do that one mapping.所以就好像我需要研究每种类型的映射,并就如何进行这种映射提出详细的计划或算法。 Then it seems like there will be THOUSANDS of possible transformation/mapping types I need to study.那么似乎我需要研究成千上万种可能的转换/映射类型。 So in that sense it seems like an extremely time-intensive problem to solve, mentally.所以从这个意义上说,在精神上解决这个问题似乎是一个非常耗时的问题。

Is there an easier way?有更容易的方法吗?

For a long time (years?) I have wanted to do this, but it's always seemed like an extremely arduous task like I'm hinting at.很长一段时间(几年?)我一直想这样做,但这似乎总是像我暗示的那样极其艰巨的任务。 I think it's because I don't clearly see in my head all the different ways/angles I can receive the AST, and I don't know how to boil it down to something I can see.我认为这是因为我没有在脑海中清楚地看到我可以接收 AST 的所有不同方式/角度,而且我不知道如何将其归结为我可以看到的东西。

In addition to just figuring out how to do each type of mapping/transformation, I also should have somewhat decent code that I am able to extend.除了弄清楚如何进行每种类型的映射/转换之外,我还应该有一些可以扩展的体面代码。 That is usually my strong suit (coming up with clean code with a simple API), but here I am struggling because yeah I don't see the full picture yet.这通常是我的强项(用简单的 API 提出干净的代码),但在这里我很挣扎,因为是的,我还没有看到完整的画面。

Writing a transpiler is a very big job... For a variety of reasons, though, JavaScript workflows are already full of transpilers, so there are many tools to help.编写转译器是一项非常艰巨的工作……但出于各种原因,JavaScript 工作流已经充满了转译器,因此有很多工具可以提供帮助。

If your target language looks like JavaScript, then you would write your transpiler as a plug-in for Babel: https://babeljs.io/如果您的目标语言看起来像 JavaScript,那么您可以将您的转译器编写为 Babel 的插件: https ://babeljs.io/

Otherwise, maybe start with jscodeshift , which will provide you with an easily accessible AST.否则,也许从jscodeshift开始,它将为您提供一个易于访问的 AST。

Many open-source javascript tools, like eslint , also have javscript parsers in there that you could extract with a bit of effort.许多开源 javascript 工具,如eslint ,也有 javscript 解析器,您可以稍微费力地提取。

Also see the AST Explorer另请参阅AST Explorer

Once you have an AST, you would typically process it recursively, maybe following the visitor pattern, to convert each AST node into the equivalent target structure.一旦你有了一个 AST,你通常会递归地处理它,可能遵循访问者模式,将每个 AST 节点转换为等效的目标结构。 Then maybe peephole optimization to simplify the resulting AST.然后也许可以通过窥视孔优化来简化生成的 AST。 Then finally serialize it.然后最后序列化它。 jscodeshift comes with a javascript serializer that you could replace with your own. jscodeshift 带有一个 javascript 序列化程序,您可以用自己的序列化程序替换它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM