简体   繁体   English

从 PDF 中提取的坐标不准确

[英]Coordinates extracted from PDF are not exact

I'm working on rendering a georeferenced pdf within a map, I was able to retrieve the geolocation information from the pdf, but the coordinates I receive are not correct, they are a few meters apart from the places they really should be.我正在努力在 map 中渲染地理参考 pdf,我能够从 pdf 中检索地理位置信息,但我收到的坐标不正确,它们与真正应该的位置相距几米。

Opening the same PDF in Avenza Maps, it indicates this list of coordinates, and these are correct:在 Avenza 地图中打开相同的 PDF,它显示了这个坐标列表,这些是正确的:

[-26.413082, -51.561534, -26.435838, -51.561643, -26.435909, -51.543773,-26.413152, -51.543667]

In the format I'm doing (reading the PDF as a String and doing a RegEx) I get these values:在我正在做的格式中(将 PDF 读取为字符串并执行 RegEx)我得到了这些值:

[-26.43302 -51.56133 -26.41418 -51.56124 -26.41424 -51.54409 -26.43309 -51.54418]
[-26.45579 -51.59842 -26.41777 -51.59822 -26.41811 -51.51036 -26.45613 -51.51053]

But unfortunately none of the two reflect in the correct place (as in avenza).但不幸的是,两者都没有反映在正确的地方(如在 avenza 中)。 That said, I opened the PDF in Notepad and found other values (more related to conversion and information), and I believe that maybe there is some way to convert the coordinates that I got through this other information, to the correct coordinates.也就是说,我在记事本中打开 PDF 并找到了其他值(与转换和信息更相关),我相信也许有某种方法可以将我通过其他信息获得的坐标转换为正确的坐标。

Follow the informations:按照信息:

<?xpacket end="w"?>
endstream
endobj
294 0 obj
3495
endobj
295 0 obj
/DeviceRGB
endobj
296 0 obj
<</Length 297 0 R>>stream
/GS_init gs
/Group_6 Do

endstream
endobj
297 0 obj
24
endobj
298 0 obj
<</ExtGState 2 0 R/ColorSpace << /CS_P 295 0 R >>/XObject << /Group_6 6 0 R >>>>endobj
299 0 obj
<</Type /Group/S /Transparency/CS 295 0 R/I false/K false>>endobj
300 0 obj
<</Type /Page/Parent 301 0 R/Contents 296 0 R/Resources 298 0 R/MediaBox [0 0 841.88808 1190.5488]/ArtBox [0 0 841.88808 1190.5488]/UserUnit 1/Group 299 0 R/VP[<</Type /Viewport/BBox [14.1732 147.400915455 822.0456 1133.350548016]/Name (þÿ T S B I I)/Measure<</Type /Measure/Subtype /GEO/Bounds [0 0 0 1 1 1 1 0 0 0]/GPTS [ -26.43302 -51.56133 -26.41418 -51.56124 -26.41424 -51.54409 -26.43309 -51.54418]/LPTS [ 0 0 0 1 1 1 1 0]/GCS<</Type /PROJCS/WKT (PROJCS["SIRGAS_2000_UTM_Zone_22S",GEOGCS["GCS_SIRGAS_2000",DATUM["D_SIRGAS_2000",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",10000000.0],PARAMETER["Central_Meridian",-51.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]])>>>>>><</Type /Viewport/BBox [14.1732 14.1732 239.961243463 122.688692878]/Name (þÿ R e f e r e n c i a _ M a p a)/Measure<</Type /Measure/Subtype /GEO/Bounds [0 0 0 1 1 1 1 0 0 0]/GPTS [ -26.45579 -51.59842 -26.41777 -51.59822 -26.41811 -51.51036 -26.45613 -51.51053]/LPTS [ 0 0 0 1 1 1 1 0]/GCS<</Type /PROJCS/WKT (PROJCS["SIRGAS_2000_UTM_Zone_22S",GEOGCS["GCS_SIRGAS_2000",DATUM["D_SIRGAS_2000",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",10000000.0],PARAMETER["Central_Meridian",-51.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]])>>>>>>]>>endobj
301 0 obj
<</Type /Pages/Kids [ 300 0 R ]/Count 1>>endobj
302 0 obj
<<>>endobj
303 0 obj
<</Type /Catalog/Pages 301 0 R/PageMode /UseNone/PageLayout /SinglePage/ViewerPreferences <</PrintScaling /None /FitWindow true /DisplayDocTitle true>>/OpenAction [300 0 R /Fit]/OCProperties<</OCGs [ 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 17 0 R 18 0 R 19 0 R 20 0 R 21 0 R 22 0 R 35 0 R 36 0 R 43 0 R 44 0 R 47 0 R 50 0 R 53 0 R 56 0 R 59 0 R 62 0 R 63 0 R 64 0 R 65 0 R 66 0 R 67 0 R 68 0 R 69 0 R 76 0 R 77 0 R 80 0 R 83 0 R 90 0 R 93 0 R 96 0 R 99 0 R 102 0 R 105 0 R 108 0 R 111 0 R 114 0 R 117 0 R 120 0 R 123 0 R 126 0 R 129 0 R 132 0 R 135 0 R 138 0 R 141 0 R 148 0 R 149 0 R 152 0 R 155 0 R 158 0 R 161 0 R 176 0 R ]/D<</Name (Layers Tree)/Order [ 176 0 R 161 0 R 158 0 R 148 0 R [ 155 0 R 152 0 R 149 0 R ] 141 0 R 138 0 R 135 0 R 132 0 R 129 0 R 126 0 R 123 0 R 120 0 R 117 0 R 114 0 R 111 0 R 108 0 R 105 0 R 102 0 R 99 0 R 96 0 R 93 0 R 90 0 R 83 0 R 80 0 R 62 0 R [ 76 0 R [ 77 0 R ] 63 0 R [ 69 0 R 68 0 R 67 0 R 64 0 R [ 66 0 R 65 0 R ] ] ] 10 0 R [ 59 0 R 43 0 R [ 56 0 R 53 0 R 50 0 R 47 0 R 44 0 R ] 11 0 R [ 36 0 R 35 0 R 22 0 R 21 0 R 12 0 R [ 20 0 R 19 0 R 18 0 R 17 0 R 16 0 R 15 0 R 14 0 R 13 0 R ] ] ] ]/ListMode /VisiblePages>>>>/Metadata 293 0 R>>endobj
304 0 obj
<</Type/XRef/Size 305/W[1 4 2]/Filter/FlateDecode/Info 292 0 R/Root 303 0 R/ID [<c9167b70223726438d277b1b4409c053> <c9167b70223726438d277b1b4409c053>]/Length 923>>stream

I needed someone to tell me some way to get the correct coordinates, I hope this information helps to find我需要有人告诉我一些方法来获得正确的坐标,我希望这些信息有助于找到

The PDF content in your question includes two ViewPort dictionaries.您问题中的 PDF 内容包括两个 ViewPort 词典。 These dictionaries map a location on the page ("BBox") onto the GPTS referencing the specified WKT.这些字典 map 页面上的一个位置(“BBox”)在引用指定 WKT 的 GPTS 上。

This is covered in the PDF 2.0 reference ISO-32000-2 section 12.9 & 12.10.这在 PDF 2.0 参考 ISO-32000-2 第 12.9 和 12.10 节中有所介绍。 Unfortunately, this spec is not freely available, and it's not cheap.不幸的是,这个规范不是免费提供的,而且也不便宜。

Here are some definitions from the spec:以下是规范中的一些定义:


BBox:盒子:

A rectangle in default user space coordinates specifying the location of the viewport on the page.默认用户空间坐标中的矩形指定页面上视口的位置。 The two coordinate pairs of the rectangle shall be specified in normalised form;矩形的两个坐标对应以规范化形式指定; that is, lower-left followed by upper-right, relative to the measuring coordinate system.也就是说,相对于测量坐标系,先是左下角,然后是右上角。 This ordering shall determine the orientation of the measuring coordinate system (that is, the direction of the positive x and y axes) in this viewport, which may have a different rotation from the page.此排序应确定此视口中测量坐标系的方向(即正 x 和 y 轴的方向),它可能与页面有不同的旋转。

GPTS:通用技术服务:

(Required; PDF 2.0) An array of numbers that shall be taken pairwise, defining points in geographic space as degrees of latitude and longitude, respectively when defining a geographic coordinate system. (必需;PDF 2.0)成对取值的数字数组,在定义地理坐标系时,将地理空间中的点分别定义为纬度和经度。 These values shall be based on the geographic coordinate system described in the GCS dictionary.这些值应基于 GCS 词典中描述的地理坐标系。 When defining a projected coordinate system, this array contains values in a planar projected coordinate space as eastings and northings.定义投影坐标系时,此数组包含平面投影坐标空间中的值作为东距和北距。 For Geospatial3D, when Geospatial feature information is present (requirement type Geospatial3D) in a 3D annotation, the GPTS array is required to hold 3D point coordinates as triples rather than pairwise where the third value of each tripe is an elevation value.对于 Geospatial3D,当 3D 注释中存在地理空间要素信息(要求类型 Geospatial3D)时,GPTS 数组需要将 3D 点坐标保存为三元组而不是成对的,其中每个三元组的第三个值是高程值。

NOTE 2 Any projected coordinate system includes an underlying geographic coordinate system.注 2 任何投影坐标系都包含一个基础地理坐标系。

WKT:工作时间:

A string of Well Known Text describing the geographic coordinate system.一串描述地理坐标系的知名文本。


The assumption is, if you're interested in Geospatial coordinates, then you know what a WKT is, and what the projection means.假设是,如果您对地理空间坐标感兴趣,那么您就会知道 WKT 是什么,以及投影的含义。

This may be enough information for you to map the geo coordinates for the separate viewports to their locations on the page.这可能足以让您了解 map 单独视口的地理坐标到它们在页面上的位置。 Here are the PDF Viewports in more readable form:以下是更具可读性的 PDF 视口:

/VP [
    <<
        /Type
        /Viewport
        /BBox [14.1732 147.400915455 822.0456 1133.350548016]
        /Name (TSBII)
        /Measure <<
            /Type
            /Measure
            /Subtype
            /GEO
            /Bounds [0 0 0 1 1 1 1 0 0 0]
            /GPTS [ -26.43302 -51.56133 -26.41418 -51.56124
                    -26.41424 -51.54409 -26.43309 -51.54418]
            /LPTS [ 0 0 0 1 1 1 1 0]
            /GCS<<
                /Type
                /PROJCS
                /WKT (
PROJCS["SIRGAS_2000_UTM_Zone_22S",
    GEOGCS["GCS_SIRGAS_2000",
        DATUM["D_SIRGAS_2000",SPHEROID["GRS_1980",6378137.0,298.257222101]],
        PRIMEM["Greenwich",0.0],
        UNIT["Degree",0.0174532925199433]
    ],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["False_Easting",500000.0],
    PARAMETER["False_Northing",10000000.0],
    PARAMETER["Central_Meridian",-51.0],
    PARAMETER["Scale_Factor",0.9996],
    PARAMETER["Latitude_Of_Origin",0.0],
    UNIT["Meter",1.0]
]
)
            >>
        >>
    >>
    <<
        /Type
        /Viewport
        /BBox [14.1732 14.1732 239.961243463 122.688692878]
        /Name (Referencia_Mapa)
        /Measure <<
            /Type
            /Measure
            /Subtype
            /GEO
            /Bounds [0 0 0 1 1 1 1 0 0 0]
            /GPTS [ -26.45579 -51.59842 -26.41777 -51.59822
                    -26.41811 -51.51036 -26.45613 -51.51053]
            /LPTS [ 0 0 0 1 1 1 1 0]
            /GCS<<
                /Type
                /PROJCS
                /WKT (
PROJCS["SIRGAS_2000_UTM_Zone_22S",
    GEOGCS["GCS_SIRGAS_2000",
        DATUM["D_SIRGAS_2000",SPHEROID["GRS_1980",6378137.0,298.257222101]],
        PRIMEM["Greenwich",0.0],
        UNIT["Degree",0.0174532925199433]
    ],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["False_Easting",500000.0],
    PARAMETER["False_Northing",10000000.0],
    PARAMETER["Central_Meridian",-51.0],
    PARAMETER["Scale_Factor",0.9996],
    PARAMETER["Latitude_Of_Origin",0.0],
    UNIT["Meter",1.0]])
            >>
        >>
    >>
]
>>

Note that a PDF file is a structured document and not parsable as a string.请注意,PDF 文件是结构化文档,无法解析为字符串。 These specific elements could be compressed, or might occur multiple times for different pages.这些特定元素可以被压缩,或者可能在不同的页面中出现多次。 You'll need a toolkit that can access Pages and Resources and Dictionaries in order to locate the ViewPorts.您将需要一个可以访问页面和资源以及字典的工具包,以便找到视口。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM