简体   繁体   English

如何通过解析JavaScript文件将数百个DJVU文件转换为TIFF文件?

[英]How to convert hundreds of DJVU files to TIFF files by parsing a JavaScript file?

I have a huge number of DJVU files and I need to convert them all to TIFF files. 我有大量的DJVU文件,我需要将它们全部转换为TIFF文件。 They are all part of a local website. 它们都是本地网站的一部分。 By local I mean unpublished website. 在本地,我的意思是未发布的网站。 It's built like a map, using JPEG imagery, PNG and GIF with transparency for layout of the overview map, HTML, CSS and some JavaScript (with jquery). 它像地图一样构建,使用JPEG图像,PNG和GIF,并具有透明的概图布局,HTML,CSS和一些JavaScript(带有jquery)。 Each part of the overview map is associated with a specific DJVU file. 总览图的每个部分都与特定的DJVU文件关联。 When you click on a part of the map a new browser window opens and shows you the actual geographical map stored in the DJVU file. 当您单击地图的一部分时,将打开一个新的浏览器窗口,并向您显示DJVU文件中存储的实际地理地图。

I will attempt to explain the structure here. 我将在这里尝试解释其结构。 Example of DJVU file: DJVU文件的示例:

112_87_10_0.djvu

This will have to be converted to TIFF and also renamed, like this: 这将必须转换为TIFF并重命名,如下所示:

HEK_S044_Vitsand_1883-95.tif

It will also have to be stored in a new folder with a similar name. 也必须将其存储在名称相似的新文件夹中。 In this example, the name of the folder would be like this: 在此示例中,文件夹名称将如下所示:

044 Vitsand

So the search path for the converted file would be like this: 因此,转换后的文件的搜索路径将如下所示:

044 Vitsand\\HEK_S044_Vitsand_1883-95.tif

The number 44 with preceding 0 is just a number. 前面有0的数字44只是一个数字。 Vitsand is the name of the map sheet and also the name of a small village in Värmland County, Sweden. Vitsand是地图的名称,也是瑞典Värmland县一个小村庄的名称。 The letter S is the designation for the county, according to ISO 3166. The last part is a year interval for when the map was made. 字母S是根据ISO 3166的县名称。最后一部分是制作地图的年份间隔。

My problem is that this takes time to do manually, and I can easily introduce errors by simply being bored by this after doing it for an extended time period. 我的问题是,这需要花费一些时间来手动完成,并且在很长一段时间后,只要对此感到无聊,就很容易引入错误。 How can this be automated? 如何实现自动化? I'm not really a programmer. 我不是一个真正的程序员。 In fact I have only recently started learning JavaScript. 实际上,我只是最近才开始学习JavaScript。 Does anyone feel like writing a script for me? 有没有人想为我写剧本? At least, please give me some pointers as to what language, method and tools to use and so on. 至少,请给我一些使用哪种语言,方法和工具等方面的指导。

I poked around in a file named lan_s.js and I can see it contains all the bits of information I am using to manually name the files. 我在一个名为lan_s.js的文件中戳了lan_s.js ,可以看到它包含了我用来手动命名文件的所有信息。 Here's what the corresponding line for the DJVU file above looks like: 上面的DJVU文件的相应行如下所示:

<area onmouseover=\"tooltip.show('Vitsand', 150);\" onmouseout=\"tooltip.hide();\" href=\"javascript: openMapEx('Värmlands län', 'J112-87-10','Vitsand','112_87_10_0.djvu','1883-95')\" alt=\"Vitsand\" shape=\"poly\" coords=\"144,154,166,155,166,172,143,171\">\

This is stored between <map name=\\"slan_harads\\">\\ and </map>\\ . 这存储在<map name=\\"slan_harads\\">\\</map>\\ I'm not sure what those backslashes represent, but they seem to be redundant. 我不确定这些反斜杠代表什么,但它们似乎是多余的。 There are more <area> tags in there, too many to post it here. 那里有更多的<area>标记,太多了无法在此处发布。 But they all have the same syntax, but the map sheet name varies, the DJVU file name varies and the map year varies. 但是它们都具有相同的语法,但是地图工作表名称有所不同,DJVU文件名有所不同,地图年份也有所不同。 So the Vitsand','112_87_10_0.djvu','1883-95 is the most important part here. 因此, Vitsand','112_87_10_0.djvu','1883-95是此处最重要的部分。 The file lan_s.js covers the entire Värmland county. lan_s.js文件覆盖整个Värmland县。 There are other files just like it for other counties. 与其他县一样,还有其他文件。 I would need to do the same thing with those. 我需要对那些做同样的事情。

I would like to use a tool like Image Magick for the conversion process. 我想在转换过程中使用Image Magick之类的工具。 It can convert DJVU to TIFF, and it allows me to explicitly set the compression to none. 它可以将DJVU转换为TIFF,并允许我将压缩率显式设置为none。 I don't want to use a tool that applies LZW compression without asking me. 我不想在不询问我的情况下使用适用LZW压缩的工具。

(For the curious, the HEK is a short for "Härads-Ekonomisk-Karta". A "härad" was a type of geographic division formerly used in Sweden. It's comparable to a "Hundred" used in England and other English speaking countries.) (出于好奇,HEK是“Härads-Ekonomisk-Karta”的缩写。“härad”是以前在瑞典使用的一种地理划分。它可与英格兰和其他英语国家的“百”相提并论。 )


start.html 的start.html

<html>
<head>
<title>Welcome!</title>
<style type="text/CSS">

</style>
<script type="text/javascript">
window.onload=timeout;
function timeout(){
window.setTimeout("redirect()",3000)}

function redirect(){
window.location="DATA/index.html"
return}
</script>
</head>
<body>
<img src="DATA/images/new_splash.jpg">
<body onload="timeout()" onClick="redirect()">
</body>
</html>

index.html 的index.html

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="X-UA-Compatible" content="IE=9; IE=8; IE=7; IE=EDGE" />
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Häradsekonomiska kartan</title>
    <link rel="stylesheet" href="style.css" type="text/css" />

    <script type="text/javascript" src="hek.js"></script>    
    <script type="text/javascript" src="jquery.min.js"></script>
    <script type="text/javascript" src="jquery.maphilight.min.js"></script>
      <script type="text/javascript">$(function() { $('.map').maphilight(); });</script>
      <script language="javascript" src="lan.js"></script>

    <script type="text/javascript" src="lan_bd.js"></script>
    <script type="text/javascript" src="lan_ab.js"></script>
    <script type="text/javascript" src="lan_c.js"></script>
    <script type="text/javascript" src="lan_d.js"></script>
    <script type="text/javascript" src="lan_e.js"></script>
    <script type="text/javascript" src="lan_k.js"></script>
    <script type="text/javascript" src="lan_m.js"></script>
    <script type="text/javascript" src="lan_n.js"></script>
    <script type="text/javascript" src="lan_o.js"></script>
    <script type="text/javascript" src="lan_s.js"></script>
    <script type="text/javascript" src="lan_t.js"></script>
    <script type="text/javascript" src="lan_u.js"></script>
    <script type="text/javascript" src="lan_w.js"></script>
    <script type="text/javascript" src="hlp.js"></script>


    <style type="text/css" media="screen">
      /* local styles here */
    </style>
  </head>

I have intentionally left out the body tag here. 我故意在这里省略了身体标签。 It's just too much, the lines expand horizontally for all eternity. 太多了,这些线条在整个永恒中都在水平延伸。

lan_s.js lan_s.js

So here is the JavaScript file I referred to above. 所以这是我上面提到的JavaScript文件。

var lan_s = "\
<map name=\"slan_harads\">\

LINES LINES LINES...

<area onmouseover=\"tooltip.show('Vägsjöfors', 150);\" onmouseout=\"tooltip.hide();\" href=\"javascript: openMapEx('Värmlands län', 'J112-87-15','Vägsjöfors','112_87_15_0.djvu','1883-95')\" alt=\"Vägsjöfors\" shape=\"poly\" coords=\"143,171,166,172,165,189,142,188\">\
<area onmouseover=\"tooltip.show('Vitsand', 150);\" onmouseout=\"tooltip.hide();\" href=\"javascript: openMapEx('Värmlands län', 'J112-87-10','Vitsand','112_87_10_0.djvu','1883-95')\" alt=\"Vitsand\" shape=\"poly\" coords=\"144,154,166,155,166,172,143,171\">\
<area onmouseover=\"tooltip.show('Kärnberget', 150);\" onmouseout=\"tooltip.hide();\" href=\"javascript: openMapEx('Värmlands län', 'J112-87-5','Kärnberget','112_87_5_0.djvu','1883-95')\" alt=\"Kärnberget\" shape=\"poly\" coords=\"145,138,167,139,166,155,144,154\">\

MORE LINES...

</map>\
\
\
<img src=\"ROOT/LAN/images/s.gif\" usemap=\"#slan_harads\" border=0>\
\
";

You won't be able to do this with just Javascript, as javascript has no inherent access to the filesystem. 您将无法仅使用Javascript来执行此操作,因为JavaScript没有对文件系统的固有访问权限。

Options then: 然后,选择:

  • Node.js: You can stand up a simple nodejs webserver which will allow you to access the filesystem, nodejs also has imagemagick module that you can install for processing images. Node.js:您可以建立一个简单的nodejs网络服务器,该服务器将允许您访问文件系统,nodejs还具有imagemagick模块,您可以安装该模块来处理图像。
  • php: You could setup php on your local server which will give you access to the filesystem, and php extensions for image processing. php:您可以在本地服务器上设置php,这将使您可以访问文件系统以及用于图像处理的php扩展名。
  • Python: It would be trivial for python to implement this, import numpy and write a simple looping script to get all the files and then do some magic. Python:Python实现此功能,导入numpy并编写一个简单的循环脚本以获取所有文件,然后做一些魔术,将是微不足道的。

The real problem at hand is determining how your renaming scheme can fit some neat rule from which to write an automated script. 眼下的真正问题是确定重命名方案如何适应一些精巧的规则,并根据这些规则编写自动脚本。

Taking Example of DJVU file: 112_87_10_0.djvu
and convert/rename to: HEK_S044_Vitsand_1883-95.tif

There needs to be a pattern from which one can apply the following logic. 需要一种可以从中应用以下逻辑的模式。

  1. Get all files in the directory to be converted. 获取目录中要转换的所有文件。
  2. Sort by some important sorting value (is it numerical ascending?) 按一些重要的排序值排序(它是数字递增的吗?)
  3. Take each conversion in order of sort and name it to a corresponding value. 按排序顺序进行每次转换,并将其命名为相应的值。

As of right now, you could easily write a script that converts all files such as this: 截至目前,您可以轻松编写一个脚本来转换所有文件,例如:

112_87_10_0.djvu ==> 112_87_10_0.tif

But until you can provide some additional ruleset for the renaming schema the rest of your question remains unanswerable. 但是,除非您可以为重命名架构提供一些其他规则集,否则其余问题仍然无法回答。

EDIT 编辑

Upon further review I do see that you provide some information in regards to getting the naming schema.. I shall look at that a bit more and revise my answer. 在进一步检查后,我确实看到您提供了有关获取命名模式的信息。我将对此进行更多研究,并修改我的答案。 If you could though, please remove all irrelevant information from your question, specificially the start and index.html bits, there is really nothing in that code of any importance to the question, and merely serves to obfuscate the important bits. 如果可以的话,请从问题中删除所有不相关的信息,特别是start和index.html位,该代码中实际上对问题没有任何重要性,而仅仅是混淆了重要的位。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM