Saturday, July 04, 2015

Opencv OCR Tutoiral: Build Tesseract OCR Library 3.02.02 with Qt 5.4 Mingw on Windows

Posted by Md.Hanif Ali on Saturday, July 04, 2015 in | 9 comments

Build Tesseract OCR library 3.02.02 with Qt 5.4: 

Steps:

1. Tesseract ocr 3.02.02 Source code Tesseract OCR 3.02.02 
2. Leptonica 1.71 source code Leptonica 1.71
3. Leptonica is quite tedious to build for Mingw because of all its dependencies. But zdenop did this work for us. Here is the link to his repository: https://github.com/zdenop/tesseract-mingw .Thanks to zdenop for making this possible
 
4. Download cmake (I am using 3.1.0 rc-3) Cmake Gui
5. Download tesseract-ocr-Qt5.4-master.zip from my repository.
6. Extract (3) "tesseract-mingw-master.zip"  and copy from the bin folder the followings:
     libgif-4.dll
     libjbig-1.dll
     libjpeg-8.dll
     liblept-3.dll : the Leptonica library.
     libpng15-15.dll
     libtiff-3.dll
     libtiffxx-3.dll
     libwebp-2.dll
     zlib1.dll

7. Create a folder "bin" and paste the following copied files.
8. Extract (2) "leptonica-1.71.tar.gz" and copy the (7) bin folder to the leptonica extracted folder
9. Now create a folder "c:/ocrQt" and paste the (8) folder in that directory



10. extract the (5) "tesseract-ocr-Qt5.4-master.zip" and edit the "CMakeLists.txt" as following

          set(OCR_DIR c:/ocrQt)
          set(MINGW_DIR C:/Qt/Qt5.4.0/Tools/mingw491_32/i686-w64-mingw32)
          set(MINGW_LIB_DIR ${MINGW_DIR}/lib)
         set(LEPTONICA_DIR ${OCR_DIR}/leptonica-1.71)

11. Save the "CMakeLists.txt".
12. Now extract the (1) "tesseract-ocr-3.02.02.tar.gz" to "c:" drive so it will look like "C:\tesseract-ocr"
13.Now paste the (11) "CMakeLists.txt" to the (12) "C:\tesseract-ocr" folder

14. Now open cmake 3.1.0 rc3 gui
15. Click "Browse Source" and select (13) "C:\tesseract-ocr" folder
16. Click "Browse Build" and select (9) "c:/ocrQt"
17. Now click configure then choose "MinGW Makefiles"
18. Click "Specify Native compilers"
19. compilers c
                  C:\Qt\Qt5.4.0\Tools\mingw491_32\bin\gcc
    compilers c++
                   C:\Qt\Qt5.4.0\Tools\mingw491_32\bin\g++




20. Click ok. Then Click "Configure" and It will show "Configuring done"
21. Click "Generate" and it will show "Generating done"

22. now open cmd in the "c:\ocrQt" folder with administrative previllege
23. now enter "mingw32-make" so it will look like "c:\ocrQt\mingw32-make"





24. Then it will be 100% completed.

 
25. You will get  the following files in the "c:\tesseract_output"

    libtesseract3.02.02.dll
    svpaint.exe
    tesseract.exe

27. copy this files to the (6) extracted folder's "bin" directory
28. copy the folder (27) "c:" and rename it "new" so it will look like "c:\new"
29. The bin directory "c:\new\bin"

29. Add system environment path %Path%= "c:\new\bin"

30. Open Qt projects

    enter the IncludePath and Libs in the .pro file

        LIBS +=-LC:\new\lib \
            -ltesseract3.02.02.dll\
            -llept.dll\

        INCLUDEPATH+=D:\\Opencv\\opencvins\\install\\include \



31. That's All

32. Find "platform.h" in the include folder of mingw tesseract. Then edit the following block-
                                    
                               typedef struct_BLOB1 {
                                     unsigned int cbSize;
                                     char *pBlobData;
                                     }BLOB1, *LPBLOB1;

    Then save it. Now run and enjoy.comment the tblob lines definitions.
Example Code:


#include <tesseract/baseapi.h>
#include <tesseract/strngs.h>
#include <iostream>

int main(int argc, char** argv)
{


    const char* lang = "eng";
    const char* filename = "c:/open/nn.jpg";

    tesseract::TessBaseAPI tess;
    tess.Init(NULL, lang, tesseract::OEM_DEFAULT);
    tess.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);

    FILE* fin = fopen(filename, "rb");
    if (fin == NULL)
    {
        std::cout << "Cannot open " << filename << std::endl;
        return -1;
    }
    fclose(fin);

    STRING text;
    if (!tess.ProcessPages(filename, NULL, 0, &text))
    {
        std::cout << "Error during processing." << std::endl;
        return -1;
    }
    else
        std::cout << text.string() << std::endl;

    return 0;
}

Result:
Input Image
Output Image

9 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hey Ali, thanks for a detailed tutorial. I am trying to use tesseract in an OpenCV 3 project being developed using c++ in visual studio 2013, which is running on a 64 bit windows 8.1 machine.Any ideas on how I could adapt your steps. Thanks in advance for your efforts.

    ReplyDelete
    Replies
    1. Tesseract is initially built for visual studio. So, there is little configuration needed for visual studio. I will soon post about the details.

      Delete
  3. hello Ali,
    Thankx for your tuto, but I'm somes problem when i compile it.
    1- C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\wtypesbase.h:385: erreur : conflicting declaration 'typedef struct tagBLOB BLOB'
    } BLOB;
    ^
    2-C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\wtypesbase.h:386: erreur : conflicting declaration 'typedef struct tagBLOB* LPBLOB'
    typedef struct tagBLOB *LPBLOB;

    and a lot of warning
    C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\combaseapi.h:153: In file included from C:/Qt/Qt5.4.2/Tools/mingw491_32/i686-w64-mingw32/include/combaseapi.h:153:0,
    C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\objbase.h:14: from C:/Qt/Qt5.4.2/Tools/mingw491_32/i686-w64-mingw32/include/objbase.h:14,
    .......

    ReplyDelete
  4. Step 32.
    Find "platform.h" in the include folder of mingw tesseract
    Then edit the following---

    typedef struct_BLOB1 {
    unsigned int cbSize;
    char *pBlobData;
    }BLOB1, *LPBLOB1;

    Then save it. Now run and enjoy.comment the tblob lines definitions.

    ReplyDelete
  5. Hello I'm Miguel from Barcelona.

    I'm trying to install Tesseract with your tutorial.

    I finish the steps, but I have some problems.

    Please, can you contact with me? I recompense you, it's very important for me. My email is meu.bdn@gmail.com

    ReplyDelete
  6. you can download the complete source code of the finished project?

    ReplyDelete
  7. When in the last step i compile it says tha struct_BLOB1 does not name a typer

    ReplyDelete
  8. When in the last step i compile it says tha struct_BLOB1 does not name a typer

    ReplyDelete