簡體   English   中英

從 Access 數據庫中的附件字段中提取文件

[英]Extracting files from an Attachment field in an Access database

我們正在進行一個項目,我們需要將存儲在 Access 數據庫中的數據遷移到緩存數據庫。 Access 數據庫包含數據類型為Attachment的列; 一些元組包含多個附件。 我可以使用.FileName獲取這些文件的文件名,但我不確定如何確定一個文件何時結束,另一個文件何時開始.FileData

我正在使用以下內容來獲取此數據:

System.Data.OleDb.OleDbCommand command= new System.Data.OleDb.OleDbCommand();
command.CommandText = "select [Sheet1].[pdf].FileData,* from [Sheet1]";
command.Connection = conn;
System.Data.OleDb.OleDbDataReader rdr = command.ExecuteReader();

(我對這個問題的原始回答具有誤導性。它適用於隨后使用 Adob​​e Reader 打開的 PDF 文件,但它並不總是適用於其他類型的文件。以下是更正后的版本。)

不幸的是,我們無法使用 OleDb 直接檢索訪問Attachment字段中的文件內容。 Access 數據庫引擎將一些元數據添加到文件的二進制內容中,如果我們通過 OleDb 檢索.FileData ,則包含該元數據。

為了說明,使用 Access UI 將名為“Document1.pdf”的文檔保存到附件字段。 該 PDF 文件的開頭如下所示:

原件.png

如果我們使用以下代碼嘗試將 PDF 文件解壓縮到磁盤

using (OleDbCommand cmd = new OleDbCommand())
{
    cmd.Connection = con;
    cmd.CommandText = 
            "SELECT Attachments.FileData " +
            "FROM AttachTest " +
            "WHERE Attachments.FileName='Document1.pdf'";
    using (OleDbDataReader rdr = cmd.ExecuteReader())
    {
        rdr.Read();
        byte[] fileData = (byte[])rdr[0];
        using (var fs = new FileStream(
                @"C:\Users\Gord\Desktop\FromFileData.pdf", 
                FileMode.Create, FileAccess.Write))
        {
            fs.Write(fileData, 0, fileData.Length);
            fs.Close();
        }
    }
}

然后生成的文件將在文件開頭包含元數據(在這種情況下為 20 個字節)

從文件數據.png

Adobe Reader 能夠打開此文件,因為它足夠強大,可以忽略文件中可能出現在 '%PDF-1.4' 簽名之前的任何“垃圾”。 不幸的是,並非所有文件格式和應用程序都對文件開頭的無關字節如此寬容。

從 Access 中的Attachment字段中提取文件的唯一Official™ 方法是使用 ACE DAO Field2對象的.SaveToFile方法,如下所示:

// required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
//
// using Microsoft.Office.Interop.Access.Dao; ...
var dbe = new DBEngine();
Database db = dbe.OpenDatabase(@"C:\Users\Public\Database1.accdb");
Recordset rstMain = db.OpenRecordset(
        "SELECT Attachments FROM AttachTest WHERE ID=1",
        RecordsetTypeEnum.dbOpenSnapshot);
Recordset2 rstAttach = rstMain.Fields["Attachments"].Value;
while ((!"Document1.pdf".Equals(rstAttach.Fields["FileName"].Value)) && (!rstAttach.EOF))
{
    rstAttach.MoveNext();
}
if (rstAttach.EOF)
{
    Console.WriteLine("Not found.");
}
else
{
    Field2 fld = (Field2)rstAttach.Fields["FileData"];
    fld.SaveToFile(@"C:\Users\Gord\Desktop\FromSaveToFile.pdf");
}
db.Close();

請注意,如果您嘗試使用 Field2 對象的.Value ,您仍然會在字節序列的開頭獲得元數據; .SaveToFile過程是將其剝離的原因。

我花了一段時間拼湊信息以檢索從附件字段中存儲的文件,所以我只是想我會分享它。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Data.OleDb;
using System.IO;
using System.Diagnostics;

namespace AttachCheck
{
    public partial class Form1 : Form
    {
        DataSet Set1 = new DataSet();
        int ColId;

        public Form1()
        {
            InitializeComponent();

            OleDbConnection connect = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='db/Adb.accdb'"); //set up connection
            //CL_ID is a fk so attachments can be linked to users
            OleDbCommand sql = new OleDbCommand("SELECT at_ID, [at_Name].[FileData], [at_Name].[FileName], [at_Name].[FileType] FROM Attachments WHERE at_ID =1;", connect);
            //adding sql to addapter to be ran

            OleDbDataAdapter OleDA = new OleDbDataAdapter(sql);
            //attempting to open connection
            try { connect.Open(); }
            catch (Exception err) { System.Console.WriteLine(err); }

            
            OleDA.Fill(Set1); //create and fill dataset
            connect.Close();for (int i = 0; i < Set1.Tables[0].Rows.Count; i++)
            {
                System.Console.WriteLine(Set1.Tables[0].Rows[i]["at_Name.FileName"].ToString() + "This is the file name");


            // by using a datagrid it allows you to display the attachments and select which to open, the open should be a button.
            dataGridView1.Rows.Add(new object[] { Set1.Tables[0].Rows[i]["at_ID"].ToString(), Set1.Tables[0].Rows[i]["at_Name.FileName"].ToString(), "Open" });
            }
        }

        private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
        {

            DataGridViewCell cell = (DataGridViewCell)
            dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex];

            System.Console.WriteLine(dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex]);
            string FullRow = dataGridView1.Rows[e.RowIndex].ToString(); //data retrieved from click on datagrid 
            //need to sub string to cut away row index and leave number
            string SubRow = FullRow.Substring(24, 1); //cutting string down from position 24 for 1 character

            System.Console.WriteLine(SubRow + " This is Row"); //

            int RowId = int.Parse(SubRow); //turn row number from string into integer that can be used

            string FullRow2 = dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex].ToString(); //data retrieved from click on datagrid 
            //need to sub string to cut away row index and leave number
            string SubRow2 = FullRow2.Substring(37, 1); //cutting string down from position 24 for 1 character
            System.Console.WriteLine(SubRow2 + " This is Column"); //
            int ColId = int.Parse(SubRow2); //turn row number from string into integer that can be used

            
            if (ColId == 2)
            {
                string fileName = Set1.Tables[0].Rows[RowId]["at_Name.FileName"].ToString(); //assign the file to variable

                //retrieving the file contents from the database as an array of bytes
                byte[] fileContents = (byte[])Set1.Tables[0].Rows[RowId]["at_Name.FileData"];


                fileContents = GetFileContents(fileContents); //send filecontents array to be decrypted

                string fileType = Set1.Tables[0].Rows[RowId]["at_Name.FileType"].ToString();


                DisplayTempFile(fileName, fileContents, fileType); //forward the file type to display file contents   
            }
        }

        private const int CONTENT_START_INDEX_DATA_OFFSET = 0; //values used for decoding 
        private const int UNKNOWN_DATA_OFFSET = 4; //the files
        private const int EXTENSION_LENGTH_DATA_OFFSET = 8; //storedw within the access database
        private const int EXTENSION_DATA_OFFSET = 12; //and this one


        private byte[] GetFileContents(byte[] fileContents)
        {

            int contentStartIndex = BitConverter.ToInt32(fileContents, CONTENT_START_INDEX_DATA_OFFSET);

            //'The next four bytes represent a value whose meaning is unknown at this stage, although it may represent a Boolean value indicating whether the data is compressed or not.
            int unknown = BitConverter.ToInt32(fileContents, UNKNOWN_DATA_OFFSET);

            //'The next four bytes contain the the length, in characters, of the file extension.
            int extensionLength = BitConverter.ToInt32(fileContents, EXTENSION_LENGTH_DATA_OFFSET);

            //'The next field in the header is the file extension, not including a dot but including a null terminator.
            //'Characters are Unicode so double the character count to get the byte count.
            string extension = Encoding.Unicode.GetString(fileContents, EXTENSION_DATA_OFFSET, extensionLength * 2);
            return fileContents.Skip(contentStartIndex).ToArray();


        }


        private void DisplayTempFile(string fileName, byte[] fileContents, string fileType)
        {

            // System.Console.WriteLine(fileName + "File Name");
            // System.Console.WriteLine(fileType + "File Type");
            // System.Console.WriteLine(fileContents + "File Contents");
            
            string tempFolderPath = Path.GetTempPath(); //creating a temperary path for file to be opened from
            string tempFilePath = Path.Combine(tempFolderPath, fileName); // assigning the file to the path

            if (!string.IsNullOrEmpty(tempFilePath)) //checking the temp file exists
            {
                tempFilePath = Path.Combine(tempFolderPath, //combines the strings 0 and 1 below
                String.Format("{0}{1}",
                Path.GetFileNameWithoutExtension(fileName),      //0                                                    
                Path.GetExtension(fileName))); //1
            }

            //System.Console.WriteLine(tempFolderPath + " tempFolderPath");
            //System.Console.WriteLine(tempFilePath + " tempFilePath");

            //'Save the file and open it.
            File.WriteAllBytes(tempFilePath, fileContents);
            //creates new file, writes bytes array to it then closes the file
            //File.ReadAllBytes(tempFilePath);

            //'Open the file.
            System.Diagnostics.Process attachmentProcess = Process.Start(tempFilePath);
            //chooses the program to open the file if available on the computer

        }
    }
}

希望這有助於某人

下面的代碼遍歷 Microsoft Access 數據庫數據表的所有記錄,並將每一行分配給一個記錄集。 遍歷保存在“文檔”字段中的所有附件。 然后將這些文件提取並保存在磁盤上。 這段代碼是對上面“Gord Thompson”介紹的代碼的擴展。 我所做的唯一一件事就是為 Visual Basic.NET 編寫代碼。

Imports Microsoft.Office.Interop.Access.Dao

使用上面的代碼行引用 Dao。

'Visual Basic.NET
Private Sub ReadAttachmentFiles()
    'required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
    'define a new database engine and a new database
    Dim dbe = New DBEngine
    Dim db As Database = dbe.OpenDatabase("C:\Users\Meisam\Documents\Databases\myDatabase.accdb")
    'define the main recordset object for each row
    Dim rstMain As Recordset = db.OpenRecordset( _
            "SELECT * FROM Companies", _
            RecordsetTypeEnum.dbOpenSnapshot)
    'evaluate whether the recordset is empty of records
    If Not (rstMain.BOF And rstMain.EOF) Then
        'if not empty, then move to the first record
        rstMain.MoveFirst()
        'do until the end of recordset is not reached
        Do Until rstMain.EOF
            Dim myID As Integer = -1
            ' ID is the name of primary field with uniqe values field 
            myID = CInt(rstMain.Fields("ID").Value)
            'define the secondary recordset object for the attachment field "Docs"
            Dim rstAttach As Recordset2 = rstMain.Fields("Docs").Value
            'evaluate whether the recordset is empty of records
            If Not (rstAttach.BOF And rstAttach.EOF) Then
                'if not empty, then move to the first record
                rstAttach.MoveFirst()
                'do until the end of recordset is not reached
                Do Until rstAttach.EOF
                    'get the filename for each attachment in the field "Docs"
                    Dim fileName As String = rstAttach.Fields("FileName").Value
                    Dim fld As Field2 = rstAttach.Fields("FileData")
                    fld.SaveToFile("C:\Users\Meisam\Documents\test\" & myID & "_" & fileName)
                    rstAttach.MoveNext()
                Loop
            End If
            rstMain.MoveNext()
        Loop
    End If
    'close the database
    db.Close()
End Sub

根據Gord Thompson回答,我想提供以下信息。

第一個字節是元數據長度的十六進制表示。 字節 8 (0x04) 是擴展長度 + 1 的十六進制表示。在此示例中,這意味着我們需要刪除前 20 個字節 (0x14):

OleDb 元數據

這可以通過以下功能輕松實現:

Function SaveBinaryData(sFileName As String, ByteArray() As Byte)
    Dim stream As New ADODB.stream 'Create Stream object
    
    With stream
        .type = adTypeBinary 'Specify stream type - we want To save binary data.
        .Open 'Open the stream And write binary data To the object
        .Write ByteArray
        .SaveToFile sFileName, adSaveCreateOverWrite 'Save binary data To disk
    End With
End Function

Public Function ReadBinaryData(sFileName As String) As Byte()
    Dim stream As New ADODB.stream
    
    With stream
        .type = adTypeBinary
        .Open
        .LoadFromFile sFileName
        ReadBinaryData = .Read
    End With
End Function

Public Function ShiftL(arrBytes() As Byte, iShift As Integer) As Byte()
    Dim i As Integer
    Dim arrReturn() As Byte
    
    For i = 0 To iShift - 1
        ReDim Preserve arrReturn(i)
        arrReturn(i) = Shift(arrBytes)
    Next
    ShiftL = arrReturn
End Function

Public Function Shift(arrBytes() As Byte) As Byte
    Dim b As Long
    
    If Not IsArray(arrBytes) Then
      Err.Raise 13, , "Type Mismatch"
      Exit Function
    End If
    
    Shift = arrBytes(0)
    For b = 1 To UBound(arrBytes)
        arrBytes(b - 1) = arrBytes(b)
    Next b
    ReDim Preserve arrBytes(UBound(arrBytes) - 1)
End Function

當您訪問附件字段的value時,只需將字節數組CDec(.Fields("FileData")(0)) 輪班后,您可以根據需要處理文件數據,例如:

Dim fldAttachment As DAO.Field2
Dim arrBytes() As Byte
Set fldAttachment = .Fields("FileData")
With fldAttachment
    arrBytes = fldAttachment.value
    ShiftL arrBytes, CDec(arrBytes(0))
    SaveBinaryData .Fields("FileName").value, ByteArray
End With

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM