简体   繁体   English

是否可以以编程方式从视频中提取文本?

[英]Is it possible to extract text from video programmatically?

I know we can extract text from image using ocr.我知道我们可以使用 ocr 从图像中提取文本。 But I need to extract the text present in video, like those in video lectures.但是我需要提取视频中存在的文本,就像视频讲座中的文本一样。 Or in other words is it possible to transcribe a video to text.或者换句话说,是否可以将视频转录为文本。 Is that possible?那可能吗? If so please suggest me how to do it in java or any other language.如果是这样,请建议我如何使用 java 或任何其他语言进行操作。

My naive linux driven approach would be:我天真的 linux 驱动方法是:

  • check: does the OCR work in my operating system?检查:OCR 在我的操作系统中工作吗?
  • extract some samples from the video using the normal runner.使用普通跑步者从视频中提取一些样本。 Each runner (for example VLC) has such a functionality.每个运行程序(例如 VLC)都有这样的功能。
  • check: how good is the OCR in extracting text from image files?检查:OCR 在从图像文件中提取文本方面有多好?
  • check: how good is the OCR in extracting text from image files with the background the video is providing?检查:OCR 在从具有视频提供背景的图像文件中提取文本方面有多好?
  • get software to extract frames from videos in batch -> there is various software which allows to create contact-sheets, this should also be able to extract images in full resolution at abitrary points in time out of the video.获取从视频中批量提取帧的软件 -> 有各种软件可以创建联系表,这也应该能够在视频中的任意时间点以全分辨率提取图像。 Full resolution might be necessary to allow the OCR to work.可能需要全分辨率才能使 OCR 工作。 Perhaps you can clip the images first, if you know, that the text is positioned in fixed rectangles.也许您可以先剪辑图像,如果您知道文本位于固定矩形中。
  • Worst case you let OCR analyse each frame of the movie.最坏的情况是让 OCR 分析电影的每一帧。

That mostly depends on how good and how fast your OCR is working.这主要取决于您的 OCR 工作的好坏和速度。 Everything else to me is very proven software.其他一切对我来说都是非常成熟的软件。 The language might be bash-shell-script, since the components will probably be separate linux programs.语言可能是 bash-shell-script,因为组件可能是单独的 linux 程序。 As I mentioned, it depends on the quality, performance and runtime environment of your OCR.正如我提到的,这取决于 OCR 的质量、性能和运行时环境。

Yes, You can do that and there are 3 ways you can achieve it.是的,您可以做到这一点,并且有 3 种方法可以实现。

  1. Split, Classify and train on your own.自行拆分、分类和训练。
    Get a performance server, A. Extract images from the video B. Develop and Train your machine learning model.获取性能服务器 A. 从视频中提取图像 B. 开发和训练您的机器学习模型。 You can use tensor flow to do the same.您可以使用张量流来做同样的事情。 Note: If you prefer to train models on your own, make sure you have enough time as sometime the developing and training requires few months and you should have data to train them.注意:如果您更喜欢自己训练模型,请确保您有足够的时间,因为有时开发和训练需要几个月的时间,并且您应该有数据来训练它们。

  2. Use an OCR framework使用 OCR 框架

  3. USE API(Freemium model).使用 API(免费增值模式)。 There are many available in the market.市场上有很多可用的。 Just google them and your will have many in hand.只需谷歌他们,你就会有很多在手。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM