Friday, August 28, 2009

Second Interaction with Tesseract OCR

I had quite some fun during the first interaction with Tesseract. While some of the outputs of the recognizer was good, others were not reasonable at all. I want to debug it -- but the best possible way seems to be their new viewer in java. To make it run, I followed the instructions as given on the above link (the only problem was that they call piccolo-1.2.jar/piccolox-1.2.jar while the downloaded file was named piccolo.jar/piccolox.jar).

After this I tried running
tesseract phototest.tif test1 segdemo inter
as per their instructions, but it would not work. I realized this is because of missing file at /usr/local/share/tessdata. Therefore, I did:

sudo cp -R /tessdata/* /usr/local/share/tessdata/

But this undid the change done previously (in the first installment) for english recognition. Therefore I had to copy the eng data files again:

sudo cp -R /tessdata/eng.* /usr/local/share/tessdata/

After this I ran the viewer on my own image. It worked but it kept on saying

ScrollView: Waiting for server...
ScrollView: Waiting for server...
ScrollView: Waiting for server...
ScrollView: Waiting for server...

Some problem with starting the server. I retried with the image they provide in as instructed on their page and it worked. If it does not work for you (it did fail for me once or twice), make the change as described under "Java problems" on the link above and make, copy the new tesseract binary to /usr/local/bin.

Sometimes I had to manually kill the previous java GUI as it would not allow the GUI to start the next time.

I still have not not been able to run the viewer with my own images. Some catch there.

3 comments:

arunjoshi said...

hi amit,
are you still continuing on tesseract?

and where are you located as of now? India?

best regards,
arun

Unknown said...

Hi there,
I tried this but didn't get very far.

Doing "make" says...

make[1]: Nothing to be done for `all-am'.

Any ideas?

Thanks!
Max

Unknown said...

@Max, sorry I'm now out of touch with Tesseract. I'm afraid I won't be able to help you there.