If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Page size question
Thanks to Mette's kindness in putting the page images
online, you can now see the entire March 18, 1911 issue of Mekeel's Weekly Stamp News at: http://pdstamps.heindorffhus.dk/ There are two images per page: a lower resolution and higher resolution one. Please post here which image size you prefer (larger or smaller). You may need to expand the image in your browser - click on the lower right hand corner box with the arrows. Thank you ------------- Newsgroups: rec.collecting.stamps.discuss From: - Find messages by this author Date: 24 Mar 2005 23:46:24 -0800 Subject: Old stamp magazines - scan - OCR - web images - any ideas? In order to set a benchmark, I went back and reworked from scratch the 8 page March 18, 1911 Mekeel's Weekly Stamp News newspaper using the following method: The page is about 13 inches by 11 inches - requires two scans per page on 8.5 x 11 inch flatbed scanner Process followed 1. Test scan at 150 dpi black and white 2. Adjust brightness based on test scan: 2.1 Brightness set to 150 on a 255 scale (makes scan brighter) 3. Scan top half of a page - 300 dpi black and white - de-screen turned on 4. Scan bottom half of a page - same settings 5. Rotate each scan in scanner software package 6. Join images together by hand: 61. Open top half image 62. Double its height 63. Flatten top half image (makes newly added area all white) 63. Open bottom half image 64. Copy bottom half image into blank area below top half image 65. Join the two halves by trimming off top part of bottom image. This is significantly easier when working with black and white images (takes 4 minutes instead of 30 minutes per page) 7. Balance out margins by cropping image 8. Remove any stray dots from margins 9. Save as black and white bitmap (1.2 megabytes) and black and white gif file (400 kilobytes). There is no loss of quality when saving to bitmap and gif format. Observations: - This comes out to about a total time of just under 2 hours per issue (15 minutes per page times 8 pages per issue). - Scanner is much more efficient at converting to a high quality black and white image than I am with converting a greyscale using an image editing program - The method of working with greyscale scans is significantly more difficult with a tiny bit better results but costs 1 hour per page. - Unadjusted 300 dpi black and white is significantly better than 150dpi greyscale. The source material is a black and white printed newspaper with almost no pictures. - OCR at 300dpi black and white is significantly easier with a 95 percent accuracy given that the black and white thresholding is done by the scanner instead of the OCR program. When the OCR program (TextBridge) converts a greyscale or color image to black and white (to allow OCR), it introduces significant noise either in its algorithm or just by attempting to dither the input image into 2 colors. - OCR accuracy is significantly better since I starting adding unknown words to the OCR package dictionary. Actually, the words were already in the dictionary but the font face recognition associated with a word was not. My guess is that the OCR program was trained with laser printed serif and sans-serif fonts and not with newspaper/movable type fonts. I pasted the two original messages below to keep this alltogether in a single message. ************************ *** Original message *** Old stamp magazines - scan - OCR - web images - any ideas? Newsgroups: rec.collecting.stamps.discuss From: - Find messages by this author Date: 27 Feb 2005 21:10:51 -0800 - Hide quoted text - - Show quoted text - Does anyone have any experience and suggestions on how to better scan in and process old stamp collecting newspapers for the web? Questions: a. What dpi resolution is needed for decent scans of 6 point text? (Old newspapers use smaller type) b. What is the best way to process the grayscale image down from 256 shades to monochrome? c. Is decent quality OCR possible on a gray background newsprint 6 point text at anything below 300 dpi? d. What is the best way to scan in pages larger than your scanner (e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)? e. Is there a better way to get straight scans? I line up the edge of the paper with the edge of the scanner glass and close the lid but keep getting scans tilted. This is troublesome because old newspapers are usually not printed exactly straight horizontally or vertically due to bending of the paper during printing. For example, I processed a public domain 1911 issue of Meekel's Weekly Stamp News as follows: a. Scan top half of a page - 300 dpi grayscale - - Hide quoted text - - Show quoted text - de-screen turned on b. Scan bottom half of a page - same settings c. Rotate both scans using Gimp (GNU Imaging Program) d. Join images together by hand: d1. Open top half image d2. Double its height d3. Open bottom half image d4. Copy bottom half image into blank area below top half image d5. (troublesome) Join the two halfs by trimming off top part of bottom image. I tried a photo stitch program but it failed unless each image was exactly straight. e. Smooth background colors (grays) f. Adjust histogram g. Adjust curve h. Threshold at about 160 out of 255 to get most of the black colors i. Save as a monochrome bitmap file and a monochrome gif file *** Message 2 *** From: Newsgroups: rec.collecting.stamps.discuss Date: 5 Mar 2005 15:09:08 -0800 Subject: Old stamp magazines - scan - OCR - web images - any ideas? Here are the results of my tests: 1. Scan 150dpi greyscale - this yields decent OCR but fails on anything resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since the source material uses a single character for "1/2", "1/3" 2. Adjusting the contrast in the scanner TWAIN program yields much better results than scanning and then adjusting in a graphics editing program 3. Scanning directly to black and white gives excellent results at 300 dpi or higher. Anything below 300 dpi has hard to read characters. 4. "Descreen" or "Reduce Moire" considerably improves scans. 5. 1200 dpi scan directly in black and white, with descreen yields almost perfect results and only fails when there is a smudge, ink blot, etc,., in the source material (quite frequent in printed newspapers). 6. Given that the source material is 10.5 inches by 12.5 inches, a 300 dpi scan ends up at 11 megabytes uncompressed greyscale bitmap - no dedgradation to compression 1.5 megabytes compressed black and white bitmap - no dedgradation to compression 400 kilobytes compressed black and white gif file - gif has no loss of quality 7. Reducing the 300 dpi image down to 150 dpi makes much of the characters unreadable and worse than scanning at 150 dpi directly 8. The "Unsharp mask" graphics filter with a large radius does an excellent job of removing and smoothing the background noise from the paper I can follow this up with an example walk through from newspaper - scanner - web image |
Ads |
#3
|
|||
|
|||
wrote in message oups.com... Thanks to Mette's kindness in putting the page images online, you can now see the entire March 18, 1911 issue of Mekeel's Weekly Stamp News at: http://pdstamps.heindorffhus.dk/ There are two images per page: a lower resolution and higher resolution one. Please post here which image size you prefer (larger or smaller). Like Pierre, I much prefer the larger version, although something perhaps 50% of the size would probably suffice equally. I should mention that I am on high speed broadband and I wonder if the large file sizes would present a problem with readers on a 56K line. Regards, Roger |
#4
|
|||
|
|||
wrote:
Thanks to Mette's kindness in putting the page images online, you can now see the entire March 18, 1911 issue of Mekeel's Weekly Stamp News at: http://pdstamps.heindorffhus.dk/ There are two images per page: a lower resolution and higher resolution one. Please post here which image size you prefer (larger or smaller). The larger size images are definitely of the higher quality. However, reading such a large scan involves a great deal of "mousing around" because it is several times larger than the ordinary browser window. Maybe you could scan each article by itself at a high resolution and underlay a low-resolution scan of the whole page with an image map so that one can sort of "zoom in" the individual articles / pictures? That having said, I must confess that I wouldn't know how to do this ... Jan-Martin |
#5
|
|||
|
|||
1 minute 38 seconds to download a 466Kb scan I should mention that I am on high speed broadband face colour=green and I wonder if the large file sizes would present a | problem with readers on a 56K line. | Regards, Roger |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
rec.collecting.books FAQ | Hardy-Boys.net | Books | 0 | May 9th 04 08:39 PM |
Book size | Books | 2 | April 14th 04 07:22 PM | |
FS: 2001 Artbox "XFL Super Size Stickers" Wax Box | J.R. Sinclair | General | 0 | April 13th 04 06:20 AM |
[FAQ] rec.collecting.books FAQ | Mike Berro | Books | 0 | December 26th 03 08:18 PM |
FS: 2002-03 "Home" & "Away" NHL Jerseys Brand New with All Tags Intact | J.R. Sinclair | Hockey | 0 | November 28th 03 05:09 AM |