Saturday, July 04, 2015

Opencv OCR Tutoiral: Build Tesseract OCR Library 3.02.02 with Qt 5.4 Mingw on Windows


Build Tesseract OCR library 3.02.02 with Qt 5.4: 

Steps:

1. Tesseract ocr 3.02.02 Source code Tesseract OCR 3.02.02 
2. Leptonica 1.71 source code Leptonica 1.71
3. Leptonica is quite tedious to build for Mingw because of all its dependencies. But zdenop did this work for us. Here is the link to his repository: https://github.com/zdenop/tesseract-mingw .Thanks to zdenop for making this possible
 
4. Download cmake (I am using 3.1.0 rc-3) Cmake Gui
5. Download tesseract-ocr-Qt5.4-master.zip from my repository.
6. Extract (3) "tesseract-mingw-master.zip"  and copy from the bin folder the followings:
     libgif-4.dll
     libjbig-1.dll
     libjpeg-8.dll
     liblept-3.dll : the Leptonica library.
     libpng15-15.dll
     libtiff-3.dll
     libtiffxx-3.dll
     libwebp-2.dll
     zlib1.dll

7. Create a folder "bin" and paste the following copied files.
8. Extract (2) "leptonica-1.71.tar.gz" and copy the (7) bin folder to the leptonica extracted folder
9. Now create a folder "c:/ocrQt" and paste the (8) folder in that directory



10. extract the (5) "tesseract-ocr-Qt5.4-master.zip" and edit the "CMakeLists.txt" as following

          set(OCR_DIR c:/ocrQt)
          set(MINGW_DIR C:/Qt/Qt5.4.0/Tools/mingw491_32/i686-w64-mingw32)
          set(MINGW_LIB_DIR ${MINGW_DIR}/lib)
         set(LEPTONICA_DIR ${OCR_DIR}/leptonica-1.71)

11. Save the "CMakeLists.txt".
12. Now extract the (1) "tesseract-ocr-3.02.02.tar.gz" to "c:" drive so it will look like "C:\tesseract-ocr"
13.Now paste the (11) "CMakeLists.txt" to the (12) "C:\tesseract-ocr" folder

14. Now open cmake 3.1.0 rc3 gui
15. Click "Browse Source" and select (13) "C:\tesseract-ocr" folder
16. Click "Browse Build" and select (9) "c:/ocrQt"
17. Now click configure then choose "MinGW Makefiles"
18. Click "Specify Native compilers"
19. compilers c
                  C:\Qt\Qt5.4.0\Tools\mingw491_32\bin\gcc
    compilers c++
                   C:\Qt\Qt5.4.0\Tools\mingw491_32\bin\g++




20. Click ok. Then Click "Configure" and It will show "Configuring done"
21. Click "Generate" and it will show "Generating done"

22. now open cmd in the "c:\ocrQt" folder with administrative previllege
23. now enter "mingw32-make" so it will look like "c:\ocrQt\mingw32-make"





24. Then it will be 100% completed.

 
25. You will get  the following files in the "c:\tesseract_output"

    libtesseract3.02.02.dll
    svpaint.exe
    tesseract.exe

27. copy this files to the (6) extracted folder's "bin" directory
28. copy the folder (27) "c:" and rename it "new" so it will look like "c:\new"
29. The bin directory "c:\new\bin"

29. Add system environment path %Path%= "c:\new\bin"

30. Open Qt projects

    enter the IncludePath and Libs in the .pro file

        LIBS +=-LC:\new\lib \
            -ltesseract3.02.02.dll\
            -llept.dll\

        INCLUDEPATH+=D:\\Opencv\\opencvins\\install\\include \



31. That's All

32. Find "platform.h" in the include folder of mingw tesseract. Then edit the following block-
                                    
                               typedef struct_BLOB1 {
                                     unsigned int cbSize;
                                     char *pBlobData;
                                     }BLOB1, *LPBLOB1;

    Then save it. Now run and enjoy.comment the tblob lines definitions.
Example Code:


#include <tesseract/baseapi.h>
#include <tesseract/strngs.h>
#include <iostream>

int main(int argc, char** argv)
{


    const char* lang = "eng";
    const char* filename = "c:/open/nn.jpg";

    tesseract::TessBaseAPI tess;
    tess.Init(NULL, lang, tesseract::OEM_DEFAULT);
    tess.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);

    FILE* fin = fopen(filename, "rb");
    if (fin == NULL)
    {
        std::cout << "Cannot open " << filename << std::endl;
        return -1;
    }
    fclose(fin);

    STRING text;
    if (!tess.ProcessPages(filename, NULL, 0, &text))
    {
        std::cout << "Error during processing." << std::endl;
        return -1;
    }
    else
        std::cout << text.string() << std::endl;

    return 0;
}

Result:
Input Image
Output Image