Organizing documents: Hard but obtainable
By Aaron Barbee
Contributor
Published June 21, 2009
Question: Hey Guru, I have a tough question for you. It’s something even my IT staff can’t solve. How can I inexpensively organize and sort tens of thousands of documents? I may have upwards of hundreds of thousands of them to go through. Their solution is to hire a huge firm, but even at that great expense, it would take too long to complete.
Answer: The most common solution for something of that magnitude is to get optical character recognition (OCR) software to scour text in the documents and try to categorize it. This will be effective if it’s a form and the sorting you are after is roughly the same and in the same spot.
Of course you know that computers are not intelligent at all. They’re high-speed idiots that do exactly what they’re told and nothing more. If, at the top left of the page there’s a name or identifying word on every page in the same exact location, the software can capture that and perform functions based on that. The rub is that OCR is hardly perfect and will undoubtedly make mistakes. You may get to an acceptable accuracy level and be happy with that.
Since people are intelligent and capable of making decisions and corrections, your solution may not be technical, but rather operational. It may very well be that the solution for your problem is indeed to use people to do it. However, like you said in your question, this can get very costly.
Depending on the information that needs to be sifted through, you may opt to use volunteers to do this. You’ll find that there are communities online that have no problem donating their time and efforts. This concept is known as “crowdsourcing”. It’s the act of outsourcing very large tasks to a very large group of people. If you can get the percentage of work per person to a small fraction, the task may be done for free, or at the very least extremely cheap.
This task will undoubtedly be hard to organize, but you may find that searching online for crowdsourcing ideas may lead to the perfect solution for you.
In computers the 80-20 rule certainly applies. Once you get to 80 percent of your goal, the last 20 percent will be exponentially harder and more expensive as you reach 100 percent. I suggest that you aim for getting 80 percent of your results and clean up the last 20 percent at whatever cost is acceptable to you.
Check out my web site www.TexasComputerGuru.com for supplemental information and previous articles as well.
Aaron Barbee owns Texas Computer Guru, a local computer services company for on-site business and residential needs. He can be contacted at 281-628-5099. E-mail questions for Aaron to sunnews(at)baytownsun.com.
Share |
Mail |
Print |
Letter