Jump to content

Using OCR: How to convert an image into text


T0SHEA

Recommended Posts

Silverquill suggested that we post this on the forum.

 

What is OCR: Optical character recognition (software)

 

Transcribing text, especially if quite long, is time consuming and tedious. There are several categories that require you to include the text from signs, monuments, plaques, etc. These instructions may seem overwhelming at first; however, when weighed against the time it takes to manually transcribe the image a little prep work makes it worthwhile, for us at least.

 

Be aware that not all images can be transcribed, usually stone, shiny background, photos with lots of glare to name a few. Take a clear photo of the item to be transcribed, if you cannot take a picture of the entire item, take it in sections. Plan ahead and remember that these images are going to be used for the transcription. Taking a high resolution pic can be to your advantage here - you can reduce the resolution somewhat after processing, but remember that your final B & W pic will be much smaller (in kilobytes) that was the original.

 

The one we use is a free online program that does not require you to sign in or register. There are several free websites available online. We used "Google" to search by just using OCR in the search bar. We use this one with great success - Link: http://www.onlineocr.net/

 

We have compared the results of this OCR software against others and this one consistently did a better, more complete job.

 

Sometimes we use a combination of three programs to accomplish the transcription:

 

Photo shop or similar program: to removed back ground color and adjust color and/or contrast.

 

Fast Stone Image Viewer (more about this later): to convert to a negative (when necessary) straighten the image, crop and adjust color and contrast. Actually, I use Fast Stone exclusively most of the time.

 

OCR: to convert image to text.

 

1. Select the image that needs transcription (it must be in a format that the OCR will accept) We use jpg. Check your OCR before getting started for what type of image format to use.

 

2. Whether you use Fast Stone or another program make a duplicate copy of the image to modify. You want to keep the original if you plan to post it on your waymark.

 

3. Look carefully at your image, if the lettering is white or a light colored you will want to convert it to a negative. (Fast Stone) (DO NOT CONVERT to BLACK and WHITE Yet.)

 

4. We would suggest that you straighten your image, and crop fairly close to the text, eliminating as much non-text as possible. (Fast Stone)

 

5. Your image is ready to modify. Use a program like Photo Shop to remove as much of the background color as you can and convert the background color to white. (Your text should be dark, hopefully fully black, at this point). You may have to experiment on the tolerance level so as not to remove the text. Use the paintbrush feature or clone and heal to remove any images, noise, or anything that is not text.

 

Or try using the histogram function to reduce background colors, this sometimes works well.

 

6. Convert your text image to black and white and increase the contrast. Increasing the contrast several times in a row works well in certain circumstances. Adjust the image size so that you have good sized letters in a full sized view. Your file should be somewhere between 200 and 1,000 k. in size by this time.

Save.

 

7. Follow instructions with the OCR. Your text should be transcribed.

 

8. Sometimes the conversion is not perfect, depending on how noisy or dirty your original pic was before processing. If there is a large amount of text, we make a paper copy of the image text to proofread and make any corrections that are necessary. We copy the transcription into Word or other text processor. I like Word, it has a spell checker and this helps make some of the corrections, if necessary.

 

9. Be patient, this does takes a little practice. Well worth the effort as we have saved countless hours of transcription time using this method.

 

10. Good Luck. Questions: Just ask.

 

Faststone Image Viewer:

If you do much post processing of your pix you really should try this out. Faststone is MUCH MUCH MORE than an image viewer. I use it for all my post processing work except for warps. It's free - just search on "Faststone Image Viewer", download it and try it out.

Edited by BK-Hunters
Link to comment

Thanks so much for posting this!

 

I make it a practice always to take pictures of signs, plaques and inscriptions for my waymarks when they are present, even if they are not required. This gives the waymark more depth, and it may be the only place on the web such text appears. That makes it a real contribution to the general body of knowledge that is available to the world.

 

Obviously for short things, transcription is still easier and quicker, but there are those times when a longer piece of text needs conversion. I've used several OCR programs when scanning printed material, but I don't know why it never occurred to me that OCR could work from .jpg and other graphic files. Why didn't they teach me this stuff in grade school?

 

I would be interested in hearing of other's experience with this, particular if someone is using it in Waymarking.

 

Thanks again!

Link to comment

If you do use OCR, please double-check to make sure the result is accurate. It's almost guaranteed that some artifacts in the image or marks on the plaque/sign will cause conversion errors, so you need to read through to make sure the result doesn't have any errors. Here's an example I just saw that was clearly done via OCR and has a number of obvious conversion errors. They can be really distracting and detract from the overall quality of the Waymark.

Link to comment

If you do use OCR, please double-check to make sure the result is accurate. It's almost guaranteed that some artifacts in the image or marks on the plaque/sign will cause conversion errors, so you need to read through to make sure the result doesn't have any errors. Here's an example I just saw that was clearly done via OCR and has a number of obvious conversion errors. They can be really distracting and detract from the overall quality of the Waymark.

Link to comment

Thank you A-Team for pointing this out again. No OCR is always perfect and proofreading the results are necessary.

 

I have even translated a plaque in Russian to Russian and the results were perfect. I was so excited with the results and could not wait to post it in the new waymark. However the results were less than perfect and what resulted were a series of ?????????.

Edited by BK-Hunters
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...