A collecting forum. CollectingBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » CollectingBanter forum » Stamps » General Discussion
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Page size question



 
 
Thread Tools Display Modes
  #1  
Old March 31st 05, 04:57 AM
external usenet poster
 
Posts: n/a
Default Page size question

Thanks to Mette's kindness in putting the page images
online, you can now see the entire March 18, 1911 issue
of Mekeel's Weekly Stamp News at:

http://pdstamps.heindorffhus.dk/


There are two images per page: a lower resolution and
higher resolution one.

Please post here which image size you prefer (larger or
smaller).

You may need to expand the image in your browser - click
on the lower right hand corner box with the arrows.

Thank you

-------------
Newsgroups: rec.collecting.stamps.discuss
From: - Find messages by this author
Date: 24 Mar 2005 23:46:24 -0800

Subject: Old stamp magazines - scan - OCR - web images - any ideas?

In order to set a benchmark, I went back and reworked
from scratch the 8 page March 18, 1911 Mekeel's Weekly
Stamp News newspaper using the following method:

The page is about 13 inches by 11 inches - requires two scans per
page on 8.5 x 11 inch flatbed scanner

Process followed

1. Test scan at 150 dpi black and white
2. Adjust brightness based on test scan:
2.1 Brightness set to 150 on a 255 scale (makes scan brighter)
3. Scan top half of a page - 300 dpi black and white - de-screen turned
on
4. Scan bottom half of a page - same settings
5. Rotate each scan in scanner software package
6. Join images together by hand:
61. Open top half image
62. Double its height
63. Flatten top half image (makes newly added area all white)
63. Open bottom half image
64. Copy bottom half image into blank area below top half image
65. Join the two halves by trimming off top part of
bottom image. This is significantly easier when working
with black and white images (takes 4 minutes instead of
30 minutes per page)
7. Balance out margins by cropping image
8. Remove any stray dots from margins
9. Save as black and white bitmap (1.2 megabytes) and black and
white gif file (400 kilobytes). There is no loss of quality
when saving to bitmap and gif format.

Observations:

- This comes out to about a total time of just under 2 hours per
issue (15 minutes per page times 8 pages per issue).
- Scanner is much more efficient at converting to a
high quality black and white image than I am with converting
a greyscale using an image editing program

- The method of working with greyscale scans is significantly
more difficult with a tiny bit better results but costs 1
hour per page.

- Unadjusted 300 dpi black and white is significantly better
than 150dpi greyscale. The source material is a black and
white printed newspaper with almost no pictures.

- OCR at 300dpi black and white is significantly easier
with a 95 percent accuracy given that the black and
white thresholding is done by the scanner instead of
the OCR program. When the OCR program (TextBridge)
converts a greyscale or color image to black and
white (to allow OCR), it introduces significant noise
either in its algorithm or just by attempting to
dither the input image into 2 colors.

- OCR accuracy is significantly better since I starting
adding unknown words to the OCR package dictionary. Actually,
the words were already in the dictionary but the font face
recognition associated with a word was not. My guess is that
the OCR program was trained with laser printed serif and
sans-serif fonts and not with newspaper/movable type fonts.

I pasted the two original messages below to keep this alltogether
in a single message.

************************
*** Original message ***

Old stamp magazines - scan - OCR - web images - any ideas?

Newsgroups: rec.collecting.stamps.discuss
From: - Find messages by this author
Date: 27 Feb 2005 21:10:51 -0800

- Hide quoted text -
- Show quoted text -
Does anyone have any experience and suggestions on how to better scan
in and process old stamp collecting newspapers for the web?
Questions:
a. What dpi resolution is needed for decent scans of 6 point text? (Old
newspapers use smaller type)
b. What is the best way to process the grayscale image down from 256
shades to monochrome?
c. Is decent quality OCR possible on a gray background newsprint 6
point text at anything below 300 dpi?
d. What is the best way to scan in pages larger than your scanner
(e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)?
e. Is there a better way to get straight scans? I line up the edge of
the paper with the edge of the scanner glass and close the lid but keep
getting scans tilted. This is troublesome because old newspapers are
usually not printed exactly straight horizontally or vertically due to
bending of the paper during printing.
For example, I processed a public domain 1911 issue of Meekel's Weekly
Stamp News as follows:
a. Scan top half of a page

- 300 dpi grayscale -

- Hide quoted text -
- Show quoted text -
de-screen turned on
b. Scan bottom half of a page - same settings
c. Rotate both scans using Gimp (GNU Imaging Program)
d. Join images together by hand:
d1. Open top half image
d2. Double its height
d3. Open bottom half image
d4. Copy bottom half image into blank area below top half image
d5. (troublesome) Join the two halfs by trimming off top part of
bottom image. I tried a photo stitch program but it failed
unless each image was exactly straight.
e. Smooth background colors (grays)
f. Adjust histogram
g. Adjust curve
h. Threshold at about 160 out of 255 to get most of the black colors
i. Save as a monochrome bitmap file and a monochrome gif file

*** Message 2 ***

From:
Newsgroups: rec.collecting.stamps.discuss
Date: 5 Mar 2005 15:09:08 -0800
Subject: Old stamp magazines - scan - OCR - web images - any ideas?

Here are the results of my tests:
1. Scan 150dpi greyscale - this yields decent OCR but fails on anything
resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since
the source material uses a single character for "1/2", "1/3"
2. Adjusting the contrast in the scanner TWAIN program yields much
better results than scanning and then adjusting in a graphics editing
program
3. Scanning directly to black and white gives excellent results at 300
dpi or higher. Anything below 300 dpi has hard to read characters.
4. "Descreen" or "Reduce Moire" considerably improves scans.
5. 1200 dpi scan directly in black and white, with descreen yields
almost perfect results and only fails when there is a smudge, ink blot,
etc,., in the source material (quite frequent in printed newspapers).
6. Given that the source material is 10.5 inches by 12.5 inches,
a 300 dpi scan ends up at
11 megabytes uncompressed greyscale bitmap - no dedgradation
to compression
1.5 megabytes compressed black and white bitmap - no
dedgradation to compression
400 kilobytes compressed black and white gif file - gif has
no loss of quality
7. Reducing the 300 dpi image down to 150 dpi makes much of the
characters unreadable and worse than scanning at 150 dpi directly
8. The "Unsharp mask" graphics filter with a large radius does an
excellent job of removing and smoothing the background noise from the
paper
I can follow this up with an example walk through from newspaper -
scanner - web image

Ads
  #2  
Old March 31st 05, 05:49 AM
Pierre Courtiade
external usenet poster
 
Posts: n/a
Default

wrote :

Thanks to Mette's kindness in putting the page images
online, you can now see the entire March 18, 1911 issue
of Mekeel's Weekly Stamp News at:

http://pdstamps.heindorffhus.dk/


There are two images per page: a lower resolution and
higher resolution one.

Please post here which image size you prefer (larger or
smaller).
................




I definitely prefer the larger images. The smaller ones are quite
difficult to read even after expanding the image size.


--
All the best,
Pierre Courtiade

to answer me, please replace NOSPAM by my family name


  #3  
Old March 31st 05, 09:45 AM
Roger Smith
external usenet poster
 
Posts: n/a
Default


wrote in message
oups.com...
Thanks to Mette's kindness in putting the page images
online, you can now see the entire March 18, 1911 issue
of Mekeel's Weekly Stamp News at:

http://pdstamps.heindorffhus.dk/


There are two images per page: a lower resolution and
higher resolution one.

Please post here which image size you prefer (larger or
smaller).


Like Pierre, I much prefer the larger version, although something perhaps
50% of the size would probably suffice equally. I should mention that I am
on high speed broadband and I wonder if the large file sizes would present a
problem with readers on a 56K line.

Regards, Roger


  #4  
Old March 31st 05, 09:57 AM
Jan-Martin Hertzsch
external usenet poster
 
Posts: n/a
Default

wrote:

Thanks to Mette's kindness in putting the page images
online, you can now see the entire March 18, 1911 issue
of Mekeel's Weekly Stamp News at:

http://pdstamps.heindorffhus.dk/


There are two images per page: a lower resolution and
higher resolution one.

Please post here which image size you prefer (larger or
smaller).


The larger size images are definitely of the higher quality.
However, reading such a large scan involves a great deal of
"mousing around" because it is several times larger than the
ordinary browser window. Maybe you could scan each article by
itself at a high resolution and underlay a low-resolution scan
of the whole page with an image map so that one can sort of
"zoom in" the individual articles / pictures? That having said,
I must confess that I wouldn't know how to do this ...

Jan-Martin
  #5  
Old March 31st 05, 02:23 PM
Rodney
external usenet poster
 
Posts: n/a
Default


1 minute 38 seconds to download a 466Kb scan

I should mention that I am on high speed broadband

face colour=green

and I wonder if the large file sizes would present a
| problem with readers on a 56K line.
| Regards, Roger



 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
rec.collecting.books FAQ Hardy-Boys.net Books 0 May 9th 04 08:39 PM
Book size Books 2 April 14th 04 07:22 PM
FS: 2001 Artbox "XFL Super Size Stickers" Wax Box J.R. Sinclair General 0 April 13th 04 06:20 AM
[FAQ] rec.collecting.books FAQ Mike Berro Books 0 December 26th 03 08:18 PM
FS: 2002-03 "Home" & "Away" NHL Jerseys Brand New with All Tags Intact J.R. Sinclair Hockey 0 November 28th 03 05:09 AM


All times are GMT +1. The time now is 10:30 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 CollectingBanter.
The comments are property of their posters.