This code turns a scanned Farsi pdf document into images and then converts it to a text file using the powerful Tesseract Open Source OCR Engine developed by Google. you can use this for any other language by changing "lang='fas'" parameter on pytesseract.image_to_string function.
shahmohamadi/PDF_TEXT_OCR
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|