A collecting forum. CollectingBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » CollectingBanter forum » Stamps » General Discussion
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Old stamp magazines - scan - OCR - web images - any ideas?



 
 
Thread Tools Display Modes
  #11  
Old March 5th 05, 11:09 PM
external usenet poster
 
Posts: n/a
Default

Here are the results of my tests:

1. Scan 150dpi greyscale - this yields decent OCR but fails on anything
resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since
the source material uses a single character for "1/2", "1/3"

2. Adjusting the contrast in the scanner TWAIN program yields much
better results than scanning and then adjusting in a graphics editing
program

3. Scanning directly to black and white gives excellent results at 300
dpi or higher. Anything below 300 dpi has hard to read characters.

4. "Descreen" or "Reduce Moire" considerably improves scans.

5. 1200 dpi scan directly in black and white, with descreen yields
almost perfect results and only fails when there is a smudge, ink blot,
etc,., in the source material (quite frequent in printed newspapers).

6. Given that the source material is 10.5 inches by 12.5 inches, a 300
dpi scan ends up at
11 megabytes uncompressed greyscale bitmap - no dedgradation to
compression
1.5 megabytes compressed black and white bitmap - no dedgradation
to compression
400 kilobytes compressed black and white gif file - gif has no loss
of quality

7. Reducing the 300 dpi image down to 150 dpi makes much of the
characters unreadable and worse than scanning at 150 dpi directly

8. The "Unsharp mask" graphics filter with a large radius does an
excellent job of removing and smoothing the background noise from the
paper

I can follow this up with an example walk through from newspaper -
scanner - web image.

I want to know if there are any recommendations of where to host the
images for each issue (given 400 kilobytes per page and 8 pages per
issue).

wrote:
Does anyone have any experience and suggestions on how to better scan
in and process old stamp collecting newspapers for the web?

Questions:

a. What dpi resolution is needed for decent scans of 6 point text?

(Old
newspapers use smaller type)

b. What is the best way to process the grayscale image down from 256
shades to monochrome?

c. Is decent quality OCR possible on a gray background newsprint 6
point text at anything below 300 dpi?

d. What is the best way to scan in pages larger than your scanner
(e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)?

e. Is there a better way to get straight scans? I line up the edge

of
the paper with the edge of the scanner glass and close the lid but

keep
getting scans tilted. This is troublesome because old newspapers are
usually not printed exactly straight horizontally or vertically due

to
bending of the paper during printing.

For example, I processed a public domain 1911 issue of Meekel's

Weekly
Stamp News as follows:

a. Scan top half of a page in 300 dpi grayscale with de-screen turned
on
b. Scan bottom half of a page - same settings
c. Rotate both scans using Gimp (GNU Imaging Program)
d. Join images together by hand:
d1. Open top half image
d2. Double its height
d3. Open bottom half image
d4. Copy bottom half image into blank area below top half image
d5. (troublesome) Join the two halfs by trimming off top part of
bottom image. I tried a photo stitch program but it failed unless

each
image was exactly straight.
e. Smooth background colors (grays)
f. Adjust histogram
g. Adjust curve
h. Threshold at about 160 out of 255 to get most of the black colors
i. Save as a monochrome bitmap file and a monochrome gif file

Thanks


Ads
  #12  
Old March 6th 05, 01:43 PM
Al
external usenet poster
 
Posts: n/a
Default

Can you send me a sample of your work?

Al

wrote:
Here are the results of my tests:

1. Scan 150dpi greyscale - this yields decent OCR but fails on anything
resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since
the source material uses a single character for "1/2", "1/3"

2. Adjusting the contrast in the scanner TWAIN program yields much
better results than scanning and then adjusting in a graphics editing
program

3. Scanning directly to black and white gives excellent results at 300
dpi or higher. Anything below 300 dpi has hard to read characters.

4. "Descreen" or "Reduce Moire" considerably improves scans.

5. 1200 dpi scan directly in black and white, with descreen yields
almost perfect results and only fails when there is a smudge, ink blot,
etc,., in the source material (quite frequent in printed newspapers).

6. Given that the source material is 10.5 inches by 12.5 inches, a 300
dpi scan ends up at
11 megabytes uncompressed greyscale bitmap - no dedgradation to
compression
1.5 megabytes compressed black and white bitmap - no dedgradation
to compression
400 kilobytes compressed black and white gif file - gif has no loss
of quality

7. Reducing the 300 dpi image down to 150 dpi makes much of the
characters unreadable and worse than scanning at 150 dpi directly

8. The "Unsharp mask" graphics filter with a large radius does an
excellent job of removing and smoothing the background noise from the
paper

I can follow this up with an example walk through from newspaper -
scanner - web image.

I want to know if there are any recommendations of where to host the
images for each issue (given 400 kilobytes per page and 8 pages per
issue).

wrote:

Does anyone have any experience and suggestions on how to better scan
in and process old stamp collecting newspapers for the web?

Questions:

a. What dpi resolution is needed for decent scans of 6 point text?


(Old

newspapers use smaller type)

b. What is the best way to process the grayscale image down from 256
shades to monochrome?

c. Is decent quality OCR possible on a gray background newsprint 6
point text at anything below 300 dpi?

d. What is the best way to scan in pages larger than your scanner
(e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)?

e. Is there a better way to get straight scans? I line up the edge


of

the paper with the edge of the scanner glass and close the lid but


keep

getting scans tilted. This is troublesome because old newspapers are
usually not printed exactly straight horizontally or vertically due


to

bending of the paper during printing.

For example, I processed a public domain 1911 issue of Meekel's


Weekly

Stamp News as follows:

a. Scan top half of a page in 300 dpi grayscale with de-screen turned
on
b. Scan bottom half of a page - same settings
c. Rotate both scans using Gimp (GNU Imaging Program)
d. Join images together by hand:
d1. Open top half image
d2. Double its height
d3. Open bottom half image
d4. Copy bottom half image into blank area below top half image
d5. (troublesome) Join the two halfs by trimming off top part of
bottom image. I tried a photo stitch program but it failed unless


each

image was exactly straight.
e. Smooth background colors (grays)
f. Adjust histogram
g. Adjust curve
h. Threshold at about 160 out of 255 to get most of the black colors
i. Save as a monochrome bitmap file and a monochrome gif file

Thanks



  #13  
Old March 6th 05, 02:39 PM
amesh \(Mette\)
external usenet poster
 
Posts: n/a
Default

skrev i en meddelelse
oups.com...

- big snip -

I want to know if there are any recommendations of where to host the
images for each issue (given 400 kilobytes per page and 8 pages per
issue).


You are welcome to have them hosted on my server. I have plenty of space,
and they can stay as long as I am alive :-) Contact me if interested
--
Best regards
Ann Mette Heindorff (Mette)
reply to heindorffhus at heindorffhus dot dk
------
Outgoing messages.checked with Norton AV





wrote:
Does anyone have any experience and suggestions on how to better scan
in and process old stamp collecting newspapers for the web?

Questions:

a. What dpi resolution is needed for decent scans of 6 point text?

(Old
newspapers use smaller type)

b. What is the best way to process the grayscale image down from 256
shades to monochrome?

c. Is decent quality OCR possible on a gray background newsprint 6
point text at anything below 300 dpi?

d. What is the best way to scan in pages larger than your scanner
(e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)?

e. Is there a better way to get straight scans? I line up the edge

of
the paper with the edge of the scanner glass and close the lid but

keep
getting scans tilted. This is troublesome because old newspapers are
usually not printed exactly straight horizontally or vertically due

to
bending of the paper during printing.

For example, I processed a public domain 1911 issue of Meekel's

Weekly
Stamp News as follows:

a. Scan top half of a page in 300 dpi grayscale with de-screen turned
on
b. Scan bottom half of a page - same settings
c. Rotate both scans using Gimp (GNU Imaging Program)
d. Join images together by hand:
d1. Open top half image
d2. Double its height
d3. Open bottom half image
d4. Copy bottom half image into blank area below top half image
d5. (troublesome) Join the two halfs by trimming off top part of
bottom image. I tried a photo stitch program but it failed unless

each
image was exactly straight.
e. Smooth background colors (grays)
f. Adjust histogram
g. Adjust curve
h. Threshold at about 160 out of 255 to get most of the black colors
i. Save as a monochrome bitmap file and a monochrome gif file

Thanks




  #14  
Old March 25th 05, 07:46 AM
external usenet poster
 
Posts: n/a
Default

In order to set a benchmark, I went back and reworked
from scratch the 8 page March 18, 1911 Mekeel's Weekly
Stamp News newspaper using the following method:

The page is about 13 inches by 11 inches - requires two scans per
page on 8.5 x 11 inch flatbed scanner

Process followed

1. Test scan at 150 dpi black and white
2. Adjust brightness based on test scan:
2.1 Brightness set to 150 on a 255 scale (makes scan brighter)
3. Scan top half of a page - 300 dpi black and white - de-screen turned
on
4. Scan bottom half of a page - same settings
5. Rotate each scan in scanner software package
6. Join images together by hand:
61. Open top half image
62. Double its height
63. Flatten top half image (makes newly added area all white)
63. Open bottom half image
64. Copy bottom half image into blank area below top half image
65. Join the two halves by trimming off top part of
bottom image. This is significantly easier when working
with black and white images (takes 4 minutes instead of
30 minutes per page)
7. Balance out margins by cropping image
8. Remove any stray dots from margins
9. Save as black and white bitmap (1.2 megabytes) and black and
white gif file (400 kilobytes). There is no loss of quality
when saving to bitmap and gif format.

Observations:

- This comes out to about a total time of just under 2 hours per
issue (15 minutes per page times 8 pages per issue).
- Scanner is much more efficient at converting to a
high quality black and white image than I am with converting
a greyscale using an image editing program

- The method of working with greyscale scans is significantly
more difficult with a tiny bit better results but costs 1
hour per page.

- Unadjusted 300 dpi black and white is significantly better
than 150dpi greyscale. The source material is a black and
white printed newspaper with almost no pictures.

- OCR at 300dpi black and white is significantly easier
with a 95 percent accuracy given that the black and
white thresholding is done by the scanner instead of
the OCR program. When the OCR program (TextBridge)
converts a greyscale or color image to black and
white (to allow OCR), it introduces significant noise
either in its algorithm or just by attempting to
dither the input image into 2 colors.

- OCR accuracy is significantly better since I starting
adding unknown words to the OCR package dictionary. Actually,
the words were already in the dictionary but the font face
recognition associated with a word was not. My guess is that
the OCR program was trained with laser printed serif and
sans-serif fonts and not with newspaper/movable type fonts.

I pasted the two original messages below to keep this alltogether
in a single message.

************************
*** Original message ***

Old stamp magazines - scan - OCR - web images - any ideas?

Newsgroups: rec.collecting.stamps.discuss
From: - Find messages by this author
Date: 27 Feb 2005 21:10:51 -0800

Does anyone have any experience and suggestions on how to better scan
in and process old stamp collecting newspapers for the web?

Questions:

a. What dpi resolution is needed for decent scans of 6 point text? (Old
newspapers use smaller type)

b. What is the best way to process the grayscale image down from 256
shades to monochrome?

c. Is decent quality OCR possible on a gray background newsprint 6
point text at anything below 300 dpi?

d. What is the best way to scan in pages larger than your scanner
(e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)?

e. Is there a better way to get straight scans? I line up the edge of
the paper with the edge of the scanner glass and close the lid but keep
getting scans tilted. This is troublesome because old newspapers are
usually not printed exactly straight horizontally or vertically due to
bending of the paper during printing.

For example, I processed a public domain 1911 issue of Meekel's Weekly
Stamp News as follows:

a. Scan top half of a page - 300 dpi grayscale - de-screen turned on
b. Scan bottom half of a page - same settings
c. Rotate both scans using Gimp (GNU Imaging Program)
d. Join images together by hand:
d1. Open top half image
d2. Double its height
d3. Open bottom half image
d4. Copy bottom half image into blank area below top half image
d5. (troublesome) Join the two halfs by trimming off top part of
bottom image. I tried a photo stitch program but it failed
unless each image was exactly straight.
e. Smooth background colors (grays)
f. Adjust histogram
g. Adjust curve
h. Threshold at about 160 out of 255 to get most of the black colors
i. Save as a monochrome bitmap file and a monochrome gif file

*** Message 2 ***

From:
Newsgroups: rec.collecting.stamps.discuss
Date: 5 Mar 2005 15:09:08 -0800
Subject: Old stamp magazines - scan - OCR - web images - any ideas?

Here are the results of my tests:

1. Scan 150dpi greyscale - this yields decent OCR but fails on anything
resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since
the source material uses a single character for "1/2", "1/3"

2. Adjusting the contrast in the scanner TWAIN program yields much
better results than scanning and then adjusting in a graphics editing
program

3. Scanning directly to black and white gives excellent results at 300
dpi or higher. Anything below 300 dpi has hard to read characters.

4. "Descreen" or "Reduce Moire" considerably improves scans.

5. 1200 dpi scan directly in black and white, with descreen yields
almost perfect results and only fails when there is a smudge, ink blot,
etc,., in the source material (quite frequent in printed newspapers).

6. Given that the source material is 10.5 inches by 12.5 inches,
a 300 dpi scan ends up at

11 megabytes uncompressed greyscale bitmap - no dedgradation
to compression

1.5 megabytes compressed black and white bitmap - no
dedgradation to compression

400 kilobytes compressed black and white gif file - gif has
no loss of quality

7. Reducing the 300 dpi image down to 150 dpi makes much of the
characters unreadable and worse than scanning at 150 dpi directly

8. The "Unsharp mask" graphics filter with a large radius does an
excellent job of removing and smoothing the background noise from the
paper

I can follow this up with an example walk through from newspaper -
scanner - web image.

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Non-Sports Cards to Trade, Sell or Buy Susan O'Fearna Cards:- non-sport 0 October 30th 04 05:40 AM
New Finland Stamp Issue Stamp Master Album US Stamps 0 May 29th 04 11:38 AM
Poggiali World Champion 250cc Stamp Pane Stamp Master Album US Stamps 0 April 24th 04 11:42 AM
FS: Non-Sports PROMO Cards/Sets/Sheets 1994 Part 2 J.R. Sinclair Cards:- non-sport 0 March 22nd 04 06:02 AM
[Fwd: FA Stampoffers] Doug Buss Marketplace 0 October 11th 03 02:24 AM


All times are GMT +1. The time now is 08:01 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 CollectingBanter.
The comments are property of their posters.