简体   繁体   English

如何以编程方式区分TeX和LaTeX文件

[英]How to programmatically distinguish between a TeX and a LaTeX file

I have a big collection of .tex files (TeX/LaTeX), and I'm writing a Python script that analyzes these files. 我收集了.tex文件(TeX / LaTeX),并且正在编写一个Python脚本来分析这些文件。 I wish only to analyze LaTeX files, thus I want to remove all pure TeX files. 我只希望分析LaTeX文件,因此我想删除所有纯TeX文件。

I have thought about making sure \\begin{document} is contained in every file, but this rejects quite a big amount of my files, since several files are only chapters in a book, long lists, or sections in a dissertation that does not have the \\begin{document} command. 我考虑过要确保每个文件都包含\\begin{document} ,但这会拒绝我的大量文件,因为几个文件只是一本书中的章节,长列表或论文中没有章节的章节\\begin{document}命令。

Does anybody have an idea, how to filter all the pure TeX files away from my collection? 有人知道如何将所有纯TeX文件从我的收藏中过滤掉吗?

I think there's unlikely to be a completely foolproof way of doing this, given that you want to be sensitive to files which can be input with \\input or \\include . 我认为这样做不太可能完全安全,因为您希望对可以使用\\input\\include输入的文件敏感。 Given a particular file, though, you can probably classify it with considerable confidence by spotting the first of the following which you find. 但是,对于给定的特定文件,您可以通过发现以下找到的第一个来相当有信心地对其进行分类。

  1. TeX files usually end with \\bye , and that's typically not defined in a LaTeX file. TeX文件通常以\\bye 结尾 ,并且通常在LaTeX文件中未定义。
  2. The macro \\begin is unlikely to be defined in a 'normal' TeX file (though \\end is defined in the plain format). \\begin不太可能在“普通” TeX文件中定义(尽管\\endplain格式定义)。

That's probably about the best you can do, though it would surely be enough for the sort of statistical analysis you appear to be doing. 尽管这对于您似乎正在做的统计分析肯定足够了,但这大概是您可以做的最好的事情。

There's nothing to stop someone writing a TeX file from defining \\begin to mean something, nor someone writing a LaTeX file to define \\bye to mean something. 没有什么可以阻止某人编写TeX文件来定义\\begin的含义,也不能阻止某人编写LaTeX文件来定义\\bye的含义。 The problem, from your point of view, is that there aren't any TeX constructs that are truly forbidden in a LaTeX file (and vice versa), even though things like \\halign would be rare in LaTeX. 这个问题,从您的角度来看,是没有那么在一个LaTeX文件(反之亦然)真正禁止任何 Tex构造,即使之类的东西\\halign将是乳胶罕见。 Indeed, since LaTeX is 'just' a TeX format, there isn't any fundamental difference between the two, at all. 确实,由于LaTeX只是“一种” TeX格式,因此两者之间根本没有任何根本区别。

Just to drive the latter point home, there's such a thing as ConTeXt , which is a TeX format which isn't plain , but which isn't LaTeX either. 只是为了让后面的观点明白,有诸如ConTeXt之类的东西,它是一种TeX格式,它不是plain格式,但也不是LaTeX。 It's rather rare, though. 不过,这很罕见。

Yeah sure, add all you file names to array, do this by listing the directory. 是的,可以将所有文件名添加到数组中,方法是列出目录。

    x = os.listdir("path") 

This will add the directory contents to the variable x. 这会将目录内容添加到变量x。 Then loop through it: 然后遍历它:

    PureTex = []
    for Char in x:
        if Char.endswith('.tex'):
            PureTex.append(Char)
        else:
            pass

Now the PureTex array will contain the pure files. 现在,PureTex数组将包含纯文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM