简体   繁体   English

从PDF文档中仅提取粗体文本

[英]Extract only bold text from PDF documents

I am facing a challenge with extracting text from pdf and its a very specific use case (not just plain old text extraction).我面临着从pdf提取文本及其非常特殊的用例(而不仅仅是简单的旧文本提取)的挑战。

I have a lot of pdf documents and want to extract bold text from them.我有很多pdf文件,想从中提取粗体文本。 I have went through multiple posts on Stack overflow and other websites but none of them works.我已经在Stack Overflow和其他网站上浏览了多个帖子,但是它们都不起作用。 If someone could help me with this it would be helpful.如果有人可以帮助我,这将是有帮助的。 My pdf's are not that complicated and are just text (not even tables).我的pdf并不是那么复杂,而只是文本(甚至不是表格)。

This is sample pdf and the output should be a list ["Shreya Singhal", "Sahara India Real Estate Corporation Limited Vs. Securities and Exchange Board of India", "Reliance Petrochemicals Ltd. Vs. Proprietors of Indian Express Newspaper"] 是pdf样本,输出应为清单[“ Shreya Singhal”,“撒哈拉印度房地产有限公司与印度证券交易委员会”,“ Reliance石化有限公司与印度快递报纸的所有人”]

Also, this is my second question and please let me know if this needs to be edited.另外,这是我的第二个问题,如果需要修改,请告诉我。 I will be more than happy to add whatever additional comments are needed.我将很乐意添加任何需要的其他评论。 My first question was closed because of lack of clearity.由于缺乏明确性,我的第一个问题已结束。

Thank you all for your help.谢谢大家的帮助。

I am facing a challenge with extracting text from pdf and its a very specific use case (not just plain old text extraction).我面临着从pdf提取文本及其非常特殊的用例(而不仅仅是简单的旧文本提取)的挑战。

I have a lot of pdf documents and want to extract bold text from them.我有很多pdf文件,想从中提取粗体文本。 I have went through multiple posts on Stack overflow and other websites but none of them works.我已经在Stack Overflow和其他网站上浏览了多个帖子,但是它们都不起作用。 If someone could help me with this it would be helpful.如果有人可以帮助我,这将是有帮助的。 My pdf's are not that complicated and are just text (not even tables).我的pdf并不是那么复杂,而只是文本(甚至不是表格)。

This is sample pdf and the output should be a list ["Shreya Singhal", "Sahara India Real Estate Corporation Limited Vs. Securities and Exchange Board of India", "Reliance Petrochemicals Ltd. Vs. Proprietors of Indian Express Newspaper"] 是pdf样本,输出应为清单[“ Shreya Singhal”,“撒哈拉印度房地产有限公司与印度证券交易委员会”,“ Reliance石化有限公司与印度快递报纸的所有人”]

Also, this is my second question and please let me know if this needs to be edited.另外,这是我的第二个问题,如果需要修改,请告诉我。 I will be more than happy to add whatever additional comments are needed.我将很乐意添加任何需要的其他评论。 My first question was closed because of lack of clearity.由于缺乏明确性,我的第一个问题已结束。

Thank you all for your help.谢谢大家的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM