Paper to Digital Conversion guide → Jan 26th 2008
Converting paper to digital files can be a daunting task. Here is our quick list to help you get started with a little less pain!
When converting documents from paper to digital, first determine what to scan, so that you don’t replicate documents that are no longer required.
Determine scanner capabilities like speed, document handling capacity, resolution in DPI, and OCR
You have 3 options:
- Outsource the entire task
- Lease/rent a high-speed scanner
- Purchase a midrange scanner for long-term/repeated use
Personally, I choose ‘c’. If I want to migrate from paper to digital, I can convert the backlog of paper to a digital format instantly. Since I will continue to encounter paper formats on a regular basis, investing in a decent scanner is most ideal. For myself, I want to be able to scan receipts for work as they come in, so they don’t accumulate.
Look for a scanner that does 15-20PPM minimum at 300 dpi. Also, look for a scanner with a feed tray that can hold at least 50 pages of paper at one time.
Determine output file format(s)
Everyone has different requirements for digital file storage. Tiff is the gold standard since it is portable, proven technology that can index data via metadata.
PDF is a relatively new standard, but it’s popular because many desktop applications create and read PDFs, and it’s possible to search against PDFs as well.
Some people perform OCR on certain types of data by storing the data as word files or plain text files. Plain text is generally preferred due to the fact that it’s readable without any proprietary tools or encodings.
Decide on a Naming Convention
Before starting to scan, have a file naming system devised
A bad naming convention will make it impossible to organize your data or find anything once it has been digitized
A good rule of thumb is "name files in a way that you will know what it is if every file you owned was all in one folder."
Establish access control / audit trails
Make some decisions regarding the fate of the paper which you are digitizing. Establish a sort of death certificate for each file you create digitally so you will know 10 years from now when you can delete it.
Try to determine the following criteria as best as possible:
- How long is this file going to be active?
- How long after it stops getting used must it be retained?
- Does it ever have to be destroyed?
- When must you save it until.
Ready to Scan?
Get test pages ready (different sizes & detail levels, color & gray scale)
Establish some loose limits on the file types you will be scanning:
- poster paper?
- data only files (like pages of meeting minutes, etc) which consist mainly of text
- rich content (like picture heavy paper or detailed documents)
- odd size documents?
- does your paper come with staples, or other binder agents?
- collect a sample set of expected paper types that you will be scanning in the future.
Test Runs
Spend a bit of time scanning test images and becoming efficient with the scanning software.
Armed with a sample set of the expected paper types, now is the time to use that sample set to do some scanning. Experiment with different settings so you can adjust accordingly for average file size, quality of the finished digital file, DPI, and speed.
OCR - (Optional)
If using OCR, establish whether this is an in-process or post-process procedure
Play with your process on the same test data you were using, to try OCR in different ways with different results
Scan data sets
Get a Coffee, load the boxes into the room with the scanner, and scan!
Datawitness it!
Now that you have digitized your records, and you have established their individual lifetimes, take any files that are crucial or needed and load them into your Datawitness account!
Email this news item