• What’s the accuracy on deepseek OCR? Archive.org has a copy of Black Shirts and Reds that has some really awful OCR text in it. I’ve wanted to get a better OCR of the text for some time. Deepseek OCR is probably not the right tool for it.

    I have wondered how good something like Crush would be at building epubs out of raw text and a epub temple/style guide.

    • ☆ Yσɠƚԋσʂ ☆OP
      link
      fedilink
      arrow-up
      2
      ·
      1 month ago

      The accuracy depends on the quality of the source image, it tends to do pretty well even with compressed ones. Doing OCR on a whole book might be a bit slow, but could be worth running a few pages through to see what it would look like. You could definitely use crush to make a script that would feed a pdf through deepseek-ocr and output formatted text. You’d probably have to stream it through by doing a few pages at a time.