简体   繁体   English

C#将pdf转换为txt

[英]C# Convert pdf to txt

I warmly welcome... I have a question I'm trying to convert PDF to txt and I can not save a txt file ?? 我热烈欢迎...我有一个问题,我正在尝试将PDF转换为txt,但是我无法保存txt文件? Someone please help me ?? 有人请帮我吗?

using System;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.IO;

namespace ZestawienieFaktur
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();

        }



        private void button1_Click(object sender, EventArgs e)
        {

            string[] filePaths = Directory.GetFiles(@"D:\\faktury\\", "*.pdf");

           foreach (string fp in filePaths)
            {
                ExtractTextFromPdf(fp);
            }

        }

        public static string ExtractTextFromPdf(string path)
        {
            using (PdfReader reader = new PdfReader(path))
            {
                StringBuilder text = new StringBuilder();

                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                string lines = text.ToString();
                using (var file = new StreamWriter(@"D:\faktury\test1.txt"))
                {
                    file.WriteLine(lines);
                    file.Close();
                }


            }




        }

    }
}

In the folder I have a few pdf files with different names. 在文件夹中,我有一些不同名称的pdf文件。 And I want all converted to the format of txt. 而且我想将所有格式都转换为txt格式。 Big thx for answer... 谢谢...

You should remove the return keyword instead and just return void . 您应该改为删除return关键字,而仅返回void The reason why it's not executing is because it stops executing the rest of the code after return . 它不执行的原因是因为它在return之后停止执行其余代码。 Change it to this: 更改为此:

public static void ExtractTextFromPdf(string path)
{
    using (PdfReader reader = new PdfReader(path))
    {
        StringBuilder text = new StringBuilder();

        for (int i = 1; i <= reader.NumberOfPages; i++)
        {
            text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
        }
        string lines = "";
       using(var file = new StreamWriter(path2))
       {
          file.WriteLine(lines);
          file.Close();
       }      

    }
}

Hope it helps! 希望能帮助到你!

OK WORKS thx friends... 好的,谢谢朋友...

using System;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.IO;

namespace ZestawienieFaktur
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();

        }



        private void button1_Click(object sender, EventArgs e)
        {

            string[] filePaths = Directory.GetFiles(@"D:\faktury\", "*.pdf");

           foreach (string fp in filePaths)
            {
                ExtractTextFromPdf(fp);
            }

        }

        public static string ExtractTextFromPdf(string path)
        {
            using (PdfReader reader = new PdfReader(path))
            {
                StringBuilder text = new StringBuilder();

                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                string lines = text.ToString();
                using (var file = new StreamWriter(@"D:\faktury\test1.txt"))
                {
                    file.WriteLine(lines);
                    file.Close();
                }
                return lines; 
            }


        }




    }

    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM