It is done and wow!! This is the first general purpose Extraction Tool for html files. It was designed to help you with two tasks. First, it provides a systematic and incredibly efficient way to extract raw data from html filings stored on the directEDGAR drives. Second, it helps you easily conform the column headings. The result of the process is an csv file with identifiers for the company and documents that the data were extracted from and the column headings you choose.
The Extraction Engine should help you collect data from any table that you can describe (using key words and boolean operators) in a set of filings you specify using the ISYS-Runtime search tool. I have successfully extracted and conformed more than 20,000 lines of data in under four hours.
Imagine pulling director compensation and beneficial ownership tables from the proxy statements for 1,500 companies over three years one day. The next day imagine you want the details reported in the deferred tax footnote for a different set of 3,000 registrants. All of these tasks can be completed in hours instead of weeks or months.
I hope to finish adding the ability to extract and manage specific blocks of text next.