简体   繁体   English

如何在 Ada 中完全快速地读取二进制文件?

[英]How to read a binary file entirely and quickly in Ada?

I would like to read the content of a binary file of several MB and store it into a buffer.我想读取几个 MB 的二进制文件的内容并将其存储到缓冲区中。 Here's my function prototype (I can change it if needed):这是我的函数原型(如果需要,我可以更改它):

procedure GET_BIN_CONTENT_FROM_PATH(PATH    : in UNBOUNDED_STRING;
                                    CONTENT : out UNBOUNDED_STRING);

Until now I've tried two methods, both using the Direct_IO package.到目前为止,我已经尝试了两种方法,都使用 Direct_IO 包。 In the first method, I was reading the file character by character;在第一种方法中,我逐个字符地读取文件; it worked, but it was awfully slow.它有效,但速度非常慢。 In order to speed up the process, I tried to read the file MB by MB:为了加快进程,我尝试逐个读取文件 MB:

procedure GET_BIN_CONTENT_FROM_PATH (PATH    : in UNBOUNDED_STRING;
                                     CONTENT : out UNBOUNDED_STRING) is

   BIN_SIZE_LIMIT : constant NATURAL := 1000000;
   subtype FILE_STRING is STRING (1 .. BIN_SIZE_LIMIT);
   package FILE_STRING_IO is new ADA.DIRECT_IO (FILE_STRING);
   FILE : FILE_STRING_IO.FILE_TYPE;
   BUFFER : FILE_STRING;

begin
   FILE_STRING_IO.OPEN (FILE, MODE => FILE_STRING_IO.IN_FILE,
                        NAME => TO_STRING (C_BASE_DIR & PATH));
   while not FILE_STRING_IO.END_OF_FILE (FILE) loop
      FILE_STRING_IO.READ (FILE, ITEM => BUFFER);
      APPEND (CONTENT, BUFFER);
   end loop;
   FILE_STRING_IO.CLOSE (FILE);
end GET_BIN_CONTENT_FROM_PATH;

Unfortunately, it seems that the READ operation won't happen if there is less than 1MB remaining in the file.不幸的是,如果文件中剩余的空间少于 1MB,则 READ 操作似乎不会发生。 As a result, big files (>1MB) get truncated, and little ones are not read at all.结果,大文件 (>1MB) 被截断,小文件根本无法读取。 It's especially visible when working on images.在处理图像时尤其明显。

So, my question is: What's the correct method to read a binary file both quickly and entirely?所以,我的问题是:快速完整地读取二进制文件的正确方法是什么?

Thanks in advance.提前致谢。

Make the Bin_Size equal to Ada.Directories.Size(my_file) , and read it in one go.使 Bin_Size 等于Ada.Directories.Size(my_file) ,并Ada.Directories.Size(my_file)读取它。

If it's too big for stack allocation (you'll get Storage_Error) allocate it with New, and use the rename trick如果它对于堆栈分配来说太大(你会得到 Storage_Error)用 New 分配它,并使用重命名技巧

my_image : bin_array renames my_image_ptr.all;

so that nothing else need know...这样就没有什么需要知道的了......
But if it's only a few MB, that won't be necessary.但如果它只有几 MB,那就没有必要了。

Ada.Streams.Stream_IO.Read reads into a Stream_Element_Array and tells you the last element read; Ada.Streams.Stream_IO.Read读入Stream_Element_Array并告诉您最后读取的元素; if the array isn't filled (because you've reached the end of file), Last will be less than Item'Last .如果数组未填充(因为您已到达文件末尾),则Last将小于Item'Last

A purist will note that Ada.Streams.Stream_Element'Size may not be the same as Character'Size , but for any normal processor chip it will be, so you can do unchecked conversion between the used part of the Stream_Element_Array and a String of the same size before appending to your Content .一个纯粹主义者会注意到, Ada.Streams.Stream_Element'Size可能不一样Character'Size ,但对于任何正常的处理器芯片就可以了,所以你可以做的已使用的部分之间的转换选中Stream_Element_ArrayString的在附加到您的Content之前相同的大小。

There are a number of "correct" ways, but here's one that you might like.有许多“正确”的方法,但这里有一种您可能会喜欢。 Especially when reading large files, an efficient way to read an entire file is to map the memory using mmap .特别是在读取大文件时,读取整个文件的有效方法是使用mmap映射内存。

Depending on your licensing needs, you could be open to a third party, GPLd solution.根据您的许可需求,您可以接受第三方 GPLd 解决方案。 AdaCore provides the GNATColl collection, which provides a nice interface for mmap . AdaCore 提供了GNATColl集合,它为mmap提供了一个很好的接口。 You can map the entire file and copy the contents.您可以映射整个文件并复制内容。

declare
   File : Mapped_File;
   Str  : Str_Access;
begin
   File := Open_Read ("/tmp/file_on_disk");
   Read (File);  --  read the whole file
   Str := Data (File);
   for S in 1 .. Last (File) loop
       Put (Str (S));
   end loop;
   Close (File);
end;

If your system doesn't support the mmap call, the library falls back to a read(2) implementation.如果您的系统不支持mmap调用,库将回退到read(2)实现。

As others have mentioned, Ada.Streams.Stream_IO.Read is the way to go.正如其他人提到的, Ada.Streams.Stream_IO.Read 是要走的路。 Here's an example I put together on my system.这是我在系统上放置的示例。 Assuming you have sufficient memory available for dynamic allocation, this is able to read files larger than the stack size.假设您有足够的内存可用于动态分配,这能够读取大于堆栈大小的文件。

I haven't dug into the internals of the Stream.IO.Read code, but I suspect that the Stream_IO package is using a 4k block of memory (allocated from the heap) to buffer read operations.我没有深入研究 Stream.IO.Read 代码的内部结构,但我怀疑 Stream_IO 包正在使用 4k 内存块(从堆分配)来缓冲读取操作。

with Ada.Directories;  use Ada.Directories;
with Ada.Direct_IO;
with Ada.Unchecked_Deallocation;
with Ada.Streams.Stream_IO;

procedure Read_Input_File is

   type Byte is mod 2 ** 8;
   type Byte_Array is array (File_Size range <>) of Byte;
   type Byte_Array_Access is access Byte_Array;
      
   procedure Delete is new Ada.Unchecked_Deallocation 
                           (Byte_Array, Byte_Array_Access);
 
   function Read_Binary_File (Filename : String)
      return Byte_Array_Access
   is
      package SIO renames Ada.Streams.Stream_IO;
 
      Binary_File_Size : File_Size := Ada.Directories.Size (Filename);
      Binary_File_Data : Byte_Array_Access; 
      S                : SIO.Stream_Access;   
      File             : SIO.File_Type;
 
   begin
      -- Allocate memory from the heap 
      Binary_File_Data := new Byte_Array (1 .. Binary_File_Size);
 
      SIO.Open (File, SIO.In_File, Filename);
      S := SIO.Stream (File);
 
      -- Read entire file into the buffer
      Byte_Array'Read (S, Binary_File_Data.all);
 
      SIO.Close (File);
 
      return Binary_File_Data;
   end;

   File_Data : Byte_Array_Access;

begin

   File_Data := Read_Binary_File ("File_Name.bin");

   -- Do something with data

   Delete (File_Data);

end Read_Input_File;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM