简体   繁体   English

从 Access 数据库中的附件字段中提取文件

[英]Extracting files from an Attachment field in an Access database

We are working on a project where we need to migrate data stored in an Access database to a cache database.我们正在进行一个项目,我们需要将存储在 Access 数据库中的数据迁移到缓存数据库。 The Access database contains columns with a data type of Attachment ; Access 数据库包含数据类型为Attachment的列; some of the tuples contain multiple attachments.一些元组包含多个附件。 I am able to obtain the filenames of these files by using .FileName , but I'm unsure how to determine when one file ends and another starts in .FileData .我可以使用.FileName获取这些文件的文件名,但我不确定如何确定一个文件何时结束,另一个文件何时开始.FileData

I am using the following to obtain this data:我正在使用以下内容来获取此数据:

System.Data.OleDb.OleDbCommand command= new System.Data.OleDb.OleDbCommand();
command.CommandText = "select [Sheet1].[pdf].FileData,* from [Sheet1]";
command.Connection = conn;
System.Data.OleDb.OleDbDataReader rdr = command.ExecuteReader();

(My original answer to this question was misleading. It worked okay for PDF files that were subsequently opened with Adobe Reader, but it did not always work properly for other types of files. The following is the corrected version.) (我对这个问题的原始回答具有误导性。它适用于随后使用 Adob​​e Reader 打开的 PDF 文件,但它并不总是适用于其他类型的文件。以下是更正后的版本。)

Unfortunately we cannot directly retrieve the contents of a file in an Access Attachment field using OleDb.不幸的是,我们无法使用 OleDb 直接检索访问Attachment字段中的文件内容。 The Access Database Engine prepends some metadata to the binary contents of the file, and that metadata is included if we retrieve the .FileData via OleDb. Access 数据库引擎将一些元数据添加到文件的二进制内容中,如果我们通过 OleDb 检索.FileData ,则包含该元数据。

To illustrate, a document named "Document1.pdf" is saved to an Attachment field using the Access UI.为了说明,使用 Access UI 将名为“Document1.pdf”的文档保存到附件字段。 The beginning of that PDF file looks like this:该 PDF 文件的开头如下所示:

原件.png

If we use the following code to try and extract the PDF file to disk如果我们使用以下代码尝试将 PDF 文件解压缩到磁盘

using (OleDbCommand cmd = new OleDbCommand())
{
    cmd.Connection = con;
    cmd.CommandText = 
            "SELECT Attachments.FileData " +
            "FROM AttachTest " +
            "WHERE Attachments.FileName='Document1.pdf'";
    using (OleDbDataReader rdr = cmd.ExecuteReader())
    {
        rdr.Read();
        byte[] fileData = (byte[])rdr[0];
        using (var fs = new FileStream(
                @"C:\Users\Gord\Desktop\FromFileData.pdf", 
                FileMode.Create, FileAccess.Write))
        {
            fs.Write(fileData, 0, fileData.Length);
            fs.Close();
        }
    }
}

then the resulting file will include the metadata at the beginning of the file (20 bytes in this case)然后生成的文件将在文件开头包含元数据(在这种情况下为 20 个字节)

从文件数据.png

Adobe Reader is able to open this file because it is robust enough to ignore any "junk" that may appear in the file before the '%PDF-1.4' signature. Adobe Reader 能够打开此文件,因为它足够强大,可以忽略文件中可能出现在 '%PDF-1.4' 签名之前的任何“垃圾”。 Unfortunately not all file formats and applications are so forgiving of extraneous bytes at the beginning of the file.不幸的是,并非所有文件格式和应用程序都对文件开头的无关字节如此宽容。

The only Official™ way of extracting files from an Attachment field in Access is to use the .SaveToFile method of an ACE DAO Field2 object, like so:从 Access 中的Attachment字段中提取文件的唯一Official™ 方法是使用 ACE DAO Field2对象的.SaveToFile方法,如下所示:

// required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
//
// using Microsoft.Office.Interop.Access.Dao; ...
var dbe = new DBEngine();
Database db = dbe.OpenDatabase(@"C:\Users\Public\Database1.accdb");
Recordset rstMain = db.OpenRecordset(
        "SELECT Attachments FROM AttachTest WHERE ID=1",
        RecordsetTypeEnum.dbOpenSnapshot);
Recordset2 rstAttach = rstMain.Fields["Attachments"].Value;
while ((!"Document1.pdf".Equals(rstAttach.Fields["FileName"].Value)) && (!rstAttach.EOF))
{
    rstAttach.MoveNext();
}
if (rstAttach.EOF)
{
    Console.WriteLine("Not found.");
}
else
{
    Field2 fld = (Field2)rstAttach.Fields["FileData"];
    fld.SaveToFile(@"C:\Users\Gord\Desktop\FromSaveToFile.pdf");
}
db.Close();

Note that if you try to use the .Value of the Field2 object you will still get the metadata at the beginning of the byte sequence;请注意,如果您尝试使用 Field2 对象的.Value ,您仍然会在字节序列的开头获得元数据; the .SaveToFile process is what strips it out. .SaveToFile过程是将其剥离的原因。

It took me a while to piece together information to retrieve a file stored from within the attachment field so I just thought I'd share it.我花了一段时间拼凑信息以检索从附件字段中存储的文件,所以我只是想我会分享它。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Data.OleDb;
using System.IO;
using System.Diagnostics;

namespace AttachCheck
{
    public partial class Form1 : Form
    {
        DataSet Set1 = new DataSet();
        int ColId;

        public Form1()
        {
            InitializeComponent();

            OleDbConnection connect = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='db/Adb.accdb'"); //set up connection
            //CL_ID is a fk so attachments can be linked to users
            OleDbCommand sql = new OleDbCommand("SELECT at_ID, [at_Name].[FileData], [at_Name].[FileName], [at_Name].[FileType] FROM Attachments WHERE at_ID =1;", connect);
            //adding sql to addapter to be ran

            OleDbDataAdapter OleDA = new OleDbDataAdapter(sql);
            //attempting to open connection
            try { connect.Open(); }
            catch (Exception err) { System.Console.WriteLine(err); }

            
            OleDA.Fill(Set1); //create and fill dataset
            connect.Close();for (int i = 0; i < Set1.Tables[0].Rows.Count; i++)
            {
                System.Console.WriteLine(Set1.Tables[0].Rows[i]["at_Name.FileName"].ToString() + "This is the file name");


            // by using a datagrid it allows you to display the attachments and select which to open, the open should be a button.
            dataGridView1.Rows.Add(new object[] { Set1.Tables[0].Rows[i]["at_ID"].ToString(), Set1.Tables[0].Rows[i]["at_Name.FileName"].ToString(), "Open" });
            }
        }

        private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
        {

            DataGridViewCell cell = (DataGridViewCell)
            dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex];

            System.Console.WriteLine(dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex]);
            string FullRow = dataGridView1.Rows[e.RowIndex].ToString(); //data retrieved from click on datagrid 
            //need to sub string to cut away row index and leave number
            string SubRow = FullRow.Substring(24, 1); //cutting string down from position 24 for 1 character

            System.Console.WriteLine(SubRow + " This is Row"); //

            int RowId = int.Parse(SubRow); //turn row number from string into integer that can be used

            string FullRow2 = dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex].ToString(); //data retrieved from click on datagrid 
            //need to sub string to cut away row index and leave number
            string SubRow2 = FullRow2.Substring(37, 1); //cutting string down from position 24 for 1 character
            System.Console.WriteLine(SubRow2 + " This is Column"); //
            int ColId = int.Parse(SubRow2); //turn row number from string into integer that can be used

            
            if (ColId == 2)
            {
                string fileName = Set1.Tables[0].Rows[RowId]["at_Name.FileName"].ToString(); //assign the file to variable

                //retrieving the file contents from the database as an array of bytes
                byte[] fileContents = (byte[])Set1.Tables[0].Rows[RowId]["at_Name.FileData"];


                fileContents = GetFileContents(fileContents); //send filecontents array to be decrypted

                string fileType = Set1.Tables[0].Rows[RowId]["at_Name.FileType"].ToString();


                DisplayTempFile(fileName, fileContents, fileType); //forward the file type to display file contents   
            }
        }

        private const int CONTENT_START_INDEX_DATA_OFFSET = 0; //values used for decoding 
        private const int UNKNOWN_DATA_OFFSET = 4; //the files
        private const int EXTENSION_LENGTH_DATA_OFFSET = 8; //storedw within the access database
        private const int EXTENSION_DATA_OFFSET = 12; //and this one


        private byte[] GetFileContents(byte[] fileContents)
        {

            int contentStartIndex = BitConverter.ToInt32(fileContents, CONTENT_START_INDEX_DATA_OFFSET);

            //'The next four bytes represent a value whose meaning is unknown at this stage, although it may represent a Boolean value indicating whether the data is compressed or not.
            int unknown = BitConverter.ToInt32(fileContents, UNKNOWN_DATA_OFFSET);

            //'The next four bytes contain the the length, in characters, of the file extension.
            int extensionLength = BitConverter.ToInt32(fileContents, EXTENSION_LENGTH_DATA_OFFSET);

            //'The next field in the header is the file extension, not including a dot but including a null terminator.
            //'Characters are Unicode so double the character count to get the byte count.
            string extension = Encoding.Unicode.GetString(fileContents, EXTENSION_DATA_OFFSET, extensionLength * 2);
            return fileContents.Skip(contentStartIndex).ToArray();


        }


        private void DisplayTempFile(string fileName, byte[] fileContents, string fileType)
        {

            // System.Console.WriteLine(fileName + "File Name");
            // System.Console.WriteLine(fileType + "File Type");
            // System.Console.WriteLine(fileContents + "File Contents");
            
            string tempFolderPath = Path.GetTempPath(); //creating a temperary path for file to be opened from
            string tempFilePath = Path.Combine(tempFolderPath, fileName); // assigning the file to the path

            if (!string.IsNullOrEmpty(tempFilePath)) //checking the temp file exists
            {
                tempFilePath = Path.Combine(tempFolderPath, //combines the strings 0 and 1 below
                String.Format("{0}{1}",
                Path.GetFileNameWithoutExtension(fileName),      //0                                                    
                Path.GetExtension(fileName))); //1
            }

            //System.Console.WriteLine(tempFolderPath + " tempFolderPath");
            //System.Console.WriteLine(tempFilePath + " tempFilePath");

            //'Save the file and open it.
            File.WriteAllBytes(tempFilePath, fileContents);
            //creates new file, writes bytes array to it then closes the file
            //File.ReadAllBytes(tempFilePath);

            //'Open the file.
            System.Diagnostics.Process attachmentProcess = Process.Start(tempFilePath);
            //chooses the program to open the file if available on the computer

        }
    }
}

Hope this helps someone希望这有助于某人

The following code goes through all the records of the Microsoft Access database data table and assigns each row to a recordset.下面的代码遍历 Microsoft Access 数据库数据表的所有记录,并将每一行分配给一个记录集。 The goes through all the attachments which are saved in field "Docs".遍历保存在“文档”字段中的所有附件。 Then extracts and saves those files on the disk.然后将这些文件提取并保存在磁盘上。 This code is an extension on the code introduced by "Gord Thompson" above.这段代码是对上面“Gord Thompson”介绍的代码的扩展。 The only thing that I did was that I wrote the code for Visual Basic.NET.我所做的唯一一件事就是为 Visual Basic.NET 编写代码。

Imports Microsoft.Office.Interop.Access.Dao

Put a reference to Dao by using the above line of code.使用上面的代码行引用 Dao。

'Visual Basic.NET
Private Sub ReadAttachmentFiles()
    'required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
    'define a new database engine and a new database
    Dim dbe = New DBEngine
    Dim db As Database = dbe.OpenDatabase("C:\Users\Meisam\Documents\Databases\myDatabase.accdb")
    'define the main recordset object for each row
    Dim rstMain As Recordset = db.OpenRecordset( _
            "SELECT * FROM Companies", _
            RecordsetTypeEnum.dbOpenSnapshot)
    'evaluate whether the recordset is empty of records
    If Not (rstMain.BOF And rstMain.EOF) Then
        'if not empty, then move to the first record
        rstMain.MoveFirst()
        'do until the end of recordset is not reached
        Do Until rstMain.EOF
            Dim myID As Integer = -1
            ' ID is the name of primary field with uniqe values field 
            myID = CInt(rstMain.Fields("ID").Value)
            'define the secondary recordset object for the attachment field "Docs"
            Dim rstAttach As Recordset2 = rstMain.Fields("Docs").Value
            'evaluate whether the recordset is empty of records
            If Not (rstAttach.BOF And rstAttach.EOF) Then
                'if not empty, then move to the first record
                rstAttach.MoveFirst()
                'do until the end of recordset is not reached
                Do Until rstAttach.EOF
                    'get the filename for each attachment in the field "Docs"
                    Dim fileName As String = rstAttach.Fields("FileName").Value
                    Dim fld As Field2 = rstAttach.Fields("FileData")
                    fld.SaveToFile("C:\Users\Meisam\Documents\test\" & myID & "_" & fileName)
                    rstAttach.MoveNext()
                Loop
            End If
            rstMain.MoveNext()
        Loop
    End If
    'close the database
    db.Close()
End Sub

According to Gord Thompson 's answer I would like to provide the following information.根据Gord Thompson回答,我想提供以下信息。

The first byte is the hexadecimal representation of the metadata's length.第一个字节是元数据长度的十六进制表示。 Byte 8 (0x04) is the hexadecimal representation of the length of the extension + 1. In this example this means that we need to remove the first 20 bytes (0x14):字节 8 (0x04) 是扩展长度 + 1 的十六进制表示。在此示例中,这意味着我们需要删除前 20 个字节 (0x14):

OleDb 元数据

This can be achieved very easily with the following functions:这可以通过以下功能轻松实现:

Function SaveBinaryData(sFileName As String, ByteArray() As Byte)
    Dim stream As New ADODB.stream 'Create Stream object
    
    With stream
        .type = adTypeBinary 'Specify stream type - we want To save binary data.
        .Open 'Open the stream And write binary data To the object
        .Write ByteArray
        .SaveToFile sFileName, adSaveCreateOverWrite 'Save binary data To disk
    End With
End Function

Public Function ReadBinaryData(sFileName As String) As Byte()
    Dim stream As New ADODB.stream
    
    With stream
        .type = adTypeBinary
        .Open
        .LoadFromFile sFileName
        ReadBinaryData = .Read
    End With
End Function

Public Function ShiftL(arrBytes() As Byte, iShift As Integer) As Byte()
    Dim i As Integer
    Dim arrReturn() As Byte
    
    For i = 0 To iShift - 1
        ReDim Preserve arrReturn(i)
        arrReturn(i) = Shift(arrBytes)
    Next
    ShiftL = arrReturn
End Function

Public Function Shift(arrBytes() As Byte) As Byte
    Dim b As Long
    
    If Not IsArray(arrBytes) Then
      Err.Raise 13, , "Type Mismatch"
      Exit Function
    End If
    
    Shift = arrBytes(0)
    For b = 1 To UBound(arrBytes)
        arrBytes(b - 1) = arrBytes(b)
    Next b
    ReDim Preserve arrBytes(UBound(arrBytes) - 1)
End Function

When you are accessing the value of the attachment field, just left shift the byte array by CDec(.Fields("FileData")(0)) .当您访问附件字段的value时,只需将字节数组CDec(.Fields("FileData")(0)) After the shift you can process the file data as desired, eg:轮班后,您可以根据需要处理文件数据,例如:

Dim fldAttachment As DAO.Field2
Dim arrBytes() As Byte
Set fldAttachment = .Fields("FileData")
With fldAttachment
    arrBytes = fldAttachment.value
    ShiftL arrBytes, CDec(arrBytes(0))
    SaveBinaryData .Fields("FileName").value, ByteArray
End With

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM