简体   繁体   English

在大型数据集上拆分PDF文件-为每一定数量的行/页创建多个PDF文件C#

[英]Split PDF file on large dataset - Create multiple PDF files for every certain number of rows/pages C#

I have a large dataset of 89k rows that I need to export to a PDF file. 我有一个89k行的大型数据集,需要将其导出到PDF文件。 With my current code I can export 30k rows perfectly fine, but when I increase the .Take to be more than 30k I get "Document has no pages" error. 使用我当前的代码,我可以导出30k行,但是当我增加.Take超过30k时,会出现“文档无页面”错误。 Now what I am trying to achieve is to create a PDF document for every 30k rows in the dataset such that 现在我要实现的是为数据集中的每30k行创建一个PDF文档,以便

from this 89k rows..
file a -> 30k rows
file b -> 30k rows
file c -> 29k rows

ie as long as theres rows/records split the file creation to every 30k rows you get. 也就是说,只要有行/记录将文件创建拆分为您获得的每30k行。 This is my current code 这是我当前的代码

var list = conStrings.GetReport().Take(30000); //get rows from DB/table

WebGrid grid = new WebGrid(source: list, canPage: false, canSort: false);            
        string gridHtml = grid.GetHtml(
                                        tableStyle: "webgrid-table",
                                        headerStyle: "webgrid-header",                                            
            columns: grid.Columns(
                    grid.Column("q_barcode", "Barcode"),
                    grid.Column("q_description", "Description"),
                    grid.Column("q_sellprice","Price", format: (item) => new HtmlString("€" + Convert.ToString(item.q_sellprice))),
                    grid.Column("unitCost","Unit Cost", format: (item) => new HtmlString("€" + Convert.ToString(item.unitCost))),
                    grid.Column("VatRate","Vat Rate %", format: (item) => new HtmlString(Convert.ToString(item.VatRate + "%"))),
                    grid.Column("grossProfit","GP %", format: (item) => new HtmlString(Convert.ToString(item.grossProfit + "%")))
                )
            ).ToString();


using (var ms = new MemoryStream())
{
//iTextSharp Document which is an abstraction of a PDF but **NOT * *a PDF
using (var doc = new Document())
{
    //writer that's bound to our PDF abstraction and our stream  
    using (var writer = PdfWriter.GetInstance(doc, ms))
    {
        // open the document for writing
        doc.Open();

        // read html data to StringReader 
        //using (var srHtml = new StringReader(gridHtml))
        using (var msCss = new MemoryStream(Encoding.UTF8.GetBytes(webgridstyle)))
        {
            using (var srHtml = new MemoryStream(Encoding.UTF8.GetBytes(gridHtml)))
            {
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance()
                    .ParseXHtml(writer, doc, srHtml, msCss);
            }
        }

        doc.Close();
    }
}

  myBytes = ms.ToArray();
}

var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "PDF_Report_"+timestamp+".pdf");

System.IO.File.WriteAllBytes(testFile, myBytes);

So from this code I would like to be able to just say 因此,从这段代码中,我想说

var list = conStrings.GetReport(); //get data from DB regardless of size

and create a document every 30k rows/or 500 pages (eg). 并每30k行/或500页(例如)创建一个文档。 Whats the best way to achieve this? 什么是实现这一目标的最佳方法?

You can run the ParseXhtml method multiple times on the same iText document with different HTML snippets. 您可以在具有不同HTML代码段的同一iText文档上多次运行ParseXhtml方法。

I'm not familiar with WebGrid, but I assume you should be able to call Take() multiple times and store the results in a List. 我对WebGrid不熟悉,但是我认为您应该能够多次调用Take()并将结果存储在List中。 Then later you can loop over this list of HTML snippets and then call ParseXhtml() per HTML snippet. 然后,您可以遍历此HTML代码段列表,然后每个HTML代码段调用ParseXhtml()。

This will lead to tables not filling the pages when they hit the end of the paginated result. 当它们到达分页结果的末尾时,这将导致表无法填充页面。 You can also merge the HTML snippets using XML parsing. 您还可以使用XML解析合并HTML代码段。

try it like this: 尝试这样:

var batchedList = conStrings.GetReport()
.Select((data,index) => new {data, index})
.GroupBy(item => item.index / 30000)
.Select(grp => grp.Select(x => x.data));

foreach(var list in batchedList)
{
    {{INSERT RESET OF YOUR METHOD HERE}}
}

This should batch the results of "conStrings.GetReport()" into groups of 30k and then foreach over the results 这应该将“ conStrings.GetReport()”的结果分为30k组,然后遍历结果

Obviously this line: 显然这行:

var list = conStrings.GetReport().Take(30000); //get rows from DB/table

wont be needed inside the foreach loop. 在foreach循环内将不需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM