Yes it is inspired by the post about texts in videos. It reminded me I have been seeking such a tool for PDFs. Yandex advertises this however it has not been very helpful in my experience.

  • loathsome dongeaterA
    link
    fedilink
    arrow-up
    4
    ·
    2 years ago

    If you want ocr you can use tesseract-ocr. If you want to extract actual text from a pdf then you can use something like pdf2text from poppler tools but you will have to fix the formatting a lot.

    • Makan
      link
      fedilink
      arrow-up
      2
      ·
      2 years ago

      Anything with good formatting is fine in my book.

      Or at least one that gets the words right.