If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
Ads |
#12
|
|||
|
|||
Can you send me a sample of your work?
Al wrote: Here are the results of my tests: 1. Scan 150dpi greyscale - this yields decent OCR but fails on anything resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since the source material uses a single character for "1/2", "1/3" 2. Adjusting the contrast in the scanner TWAIN program yields much better results than scanning and then adjusting in a graphics editing program 3. Scanning directly to black and white gives excellent results at 300 dpi or higher. Anything below 300 dpi has hard to read characters. 4. "Descreen" or "Reduce Moire" considerably improves scans. 5. 1200 dpi scan directly in black and white, with descreen yields almost perfect results and only fails when there is a smudge, ink blot, etc,., in the source material (quite frequent in printed newspapers). 6. Given that the source material is 10.5 inches by 12.5 inches, a 300 dpi scan ends up at 11 megabytes uncompressed greyscale bitmap - no dedgradation to compression 1.5 megabytes compressed black and white bitmap - no dedgradation to compression 400 kilobytes compressed black and white gif file - gif has no loss of quality 7. Reducing the 300 dpi image down to 150 dpi makes much of the characters unreadable and worse than scanning at 150 dpi directly 8. The "Unsharp mask" graphics filter with a large radius does an excellent job of removing and smoothing the background noise from the paper I can follow this up with an example walk through from newspaper - scanner - web image. I want to know if there are any recommendations of where to host the images for each issue (given 400 kilobytes per page and 8 pages per issue). wrote: Does anyone have any experience and suggestions on how to better scan in and process old stamp collecting newspapers for the web? Questions: a. What dpi resolution is needed for decent scans of 6 point text? (Old newspapers use smaller type) b. What is the best way to process the grayscale image down from 256 shades to monochrome? c. Is decent quality OCR possible on a gray background newsprint 6 point text at anything below 300 dpi? d. What is the best way to scan in pages larger than your scanner (e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)? e. Is there a better way to get straight scans? I line up the edge of the paper with the edge of the scanner glass and close the lid but keep getting scans tilted. This is troublesome because old newspapers are usually not printed exactly straight horizontally or vertically due to bending of the paper during printing. For example, I processed a public domain 1911 issue of Meekel's Weekly Stamp News as follows: a. Scan top half of a page in 300 dpi grayscale with de-screen turned on b. Scan bottom half of a page - same settings c. Rotate both scans using Gimp (GNU Imaging Program) d. Join images together by hand: d1. Open top half image d2. Double its height d3. Open bottom half image d4. Copy bottom half image into blank area below top half image d5. (troublesome) Join the two halfs by trimming off top part of bottom image. I tried a photo stitch program but it failed unless each image was exactly straight. e. Smooth background colors (grays) f. Adjust histogram g. Adjust curve h. Threshold at about 160 out of 255 to get most of the black colors i. Save as a monochrome bitmap file and a monochrome gif file Thanks |
#13
|
|||
|
|||
skrev i en meddelelse
oups.com... - big snip - I want to know if there are any recommendations of where to host the images for each issue (given 400 kilobytes per page and 8 pages per issue). You are welcome to have them hosted on my server. I have plenty of space, and they can stay as long as I am alive :-) Contact me if interested -- Best regards Ann Mette Heindorff (Mette) reply to heindorffhus at heindorffhus dot dk ------ Outgoing messages.checked with Norton AV wrote: Does anyone have any experience and suggestions on how to better scan in and process old stamp collecting newspapers for the web? Questions: a. What dpi resolution is needed for decent scans of 6 point text? (Old newspapers use smaller type) b. What is the best way to process the grayscale image down from 256 shades to monochrome? c. Is decent quality OCR possible on a gray background newsprint 6 point text at anything below 300 dpi? d. What is the best way to scan in pages larger than your scanner (e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)? e. Is there a better way to get straight scans? I line up the edge of the paper with the edge of the scanner glass and close the lid but keep getting scans tilted. This is troublesome because old newspapers are usually not printed exactly straight horizontally or vertically due to bending of the paper during printing. For example, I processed a public domain 1911 issue of Meekel's Weekly Stamp News as follows: a. Scan top half of a page in 300 dpi grayscale with de-screen turned on b. Scan bottom half of a page - same settings c. Rotate both scans using Gimp (GNU Imaging Program) d. Join images together by hand: d1. Open top half image d2. Double its height d3. Open bottom half image d4. Copy bottom half image into blank area below top half image d5. (troublesome) Join the two halfs by trimming off top part of bottom image. I tried a photo stitch program but it failed unless each image was exactly straight. e. Smooth background colors (grays) f. Adjust histogram g. Adjust curve h. Threshold at about 160 out of 255 to get most of the black colors i. Save as a monochrome bitmap file and a monochrome gif file Thanks |
#14
|
|||
|
|||
In order to set a benchmark, I went back and reworked
from scratch the 8 page March 18, 1911 Mekeel's Weekly Stamp News newspaper using the following method: The page is about 13 inches by 11 inches - requires two scans per page on 8.5 x 11 inch flatbed scanner Process followed 1. Test scan at 150 dpi black and white 2. Adjust brightness based on test scan: 2.1 Brightness set to 150 on a 255 scale (makes scan brighter) 3. Scan top half of a page - 300 dpi black and white - de-screen turned on 4. Scan bottom half of a page - same settings 5. Rotate each scan in scanner software package 6. Join images together by hand: 61. Open top half image 62. Double its height 63. Flatten top half image (makes newly added area all white) 63. Open bottom half image 64. Copy bottom half image into blank area below top half image 65. Join the two halves by trimming off top part of bottom image. This is significantly easier when working with black and white images (takes 4 minutes instead of 30 minutes per page) 7. Balance out margins by cropping image 8. Remove any stray dots from margins 9. Save as black and white bitmap (1.2 megabytes) and black and white gif file (400 kilobytes). There is no loss of quality when saving to bitmap and gif format. Observations: - This comes out to about a total time of just under 2 hours per issue (15 minutes per page times 8 pages per issue). - Scanner is much more efficient at converting to a high quality black and white image than I am with converting a greyscale using an image editing program - The method of working with greyscale scans is significantly more difficult with a tiny bit better results but costs 1 hour per page. - Unadjusted 300 dpi black and white is significantly better than 150dpi greyscale. The source material is a black and white printed newspaper with almost no pictures. - OCR at 300dpi black and white is significantly easier with a 95 percent accuracy given that the black and white thresholding is done by the scanner instead of the OCR program. When the OCR program (TextBridge) converts a greyscale or color image to black and white (to allow OCR), it introduces significant noise either in its algorithm or just by attempting to dither the input image into 2 colors. - OCR accuracy is significantly better since I starting adding unknown words to the OCR package dictionary. Actually, the words were already in the dictionary but the font face recognition associated with a word was not. My guess is that the OCR program was trained with laser printed serif and sans-serif fonts and not with newspaper/movable type fonts. I pasted the two original messages below to keep this alltogether in a single message. ************************ *** Original message *** Old stamp magazines - scan - OCR - web images - any ideas? Newsgroups: rec.collecting.stamps.discuss From: - Find messages by this author Date: 27 Feb 2005 21:10:51 -0800 Does anyone have any experience and suggestions on how to better scan in and process old stamp collecting newspapers for the web? Questions: a. What dpi resolution is needed for decent scans of 6 point text? (Old newspapers use smaller type) b. What is the best way to process the grayscale image down from 256 shades to monochrome? c. Is decent quality OCR possible on a gray background newsprint 6 point text at anything below 300 dpi? d. What is the best way to scan in pages larger than your scanner (e.g., 11x13 inches on an 8.5x11 inch flatbed scanner)? e. Is there a better way to get straight scans? I line up the edge of the paper with the edge of the scanner glass and close the lid but keep getting scans tilted. This is troublesome because old newspapers are usually not printed exactly straight horizontally or vertically due to bending of the paper during printing. For example, I processed a public domain 1911 issue of Meekel's Weekly Stamp News as follows: a. Scan top half of a page - 300 dpi grayscale - de-screen turned on b. Scan bottom half of a page - same settings c. Rotate both scans using Gimp (GNU Imaging Program) d. Join images together by hand: d1. Open top half image d2. Double its height d3. Open bottom half image d4. Copy bottom half image into blank area below top half image d5. (troublesome) Join the two halfs by trimming off top part of bottom image. I tried a photo stitch program but it failed unless each image was exactly straight. e. Smooth background colors (grays) f. Adjust histogram g. Adjust curve h. Threshold at about 160 out of 255 to get most of the black colors i. Save as a monochrome bitmap file and a monochrome gif file *** Message 2 *** From: Newsgroups: rec.collecting.stamps.discuss Date: 5 Mar 2005 15:09:08 -0800 Subject: Old stamp magazines - scan - OCR - web images - any ideas? Here are the results of my tests: 1. Scan 150dpi greyscale - this yields decent OCR but fails on anything resembling a stamp denomiation (1b, 1r, 1/2s, 2 1/2r) especially since the source material uses a single character for "1/2", "1/3" 2. Adjusting the contrast in the scanner TWAIN program yields much better results than scanning and then adjusting in a graphics editing program 3. Scanning directly to black and white gives excellent results at 300 dpi or higher. Anything below 300 dpi has hard to read characters. 4. "Descreen" or "Reduce Moire" considerably improves scans. 5. 1200 dpi scan directly in black and white, with descreen yields almost perfect results and only fails when there is a smudge, ink blot, etc,., in the source material (quite frequent in printed newspapers). 6. Given that the source material is 10.5 inches by 12.5 inches, a 300 dpi scan ends up at 11 megabytes uncompressed greyscale bitmap - no dedgradation to compression 1.5 megabytes compressed black and white bitmap - no dedgradation to compression 400 kilobytes compressed black and white gif file - gif has no loss of quality 7. Reducing the 300 dpi image down to 150 dpi makes much of the characters unreadable and worse than scanning at 150 dpi directly 8. The "Unsharp mask" graphics filter with a large radius does an excellent job of removing and smoothing the background noise from the paper I can follow this up with an example walk through from newspaper - scanner - web image. |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Non-Sports Cards to Trade, Sell or Buy | Susan O'Fearna | Cards:- non-sport | 0 | October 30th 04 05:40 AM |
New Finland Stamp Issue | Stamp Master Album | US Stamps | 0 | May 29th 04 11:38 AM |
Poggiali World Champion 250cc Stamp Pane | Stamp Master Album | US Stamps | 0 | April 24th 04 11:42 AM |
FS: Non-Sports PROMO Cards/Sets/Sheets 1994 Part 2 | J.R. Sinclair | Cards:- non-sport | 0 | March 22nd 04 06:02 AM |
[Fwd: FA Stampoffers] | Doug Buss | Marketplace | 0 | October 11th 03 02:24 AM |