Over 2 million EDGAR filing artifacts!

That’s right – our pre-parsing and normalization tools have made available over two million observations of various data items from EDGAR filings.  This is a huge number. It sort of blew me away when I received a message this morning from one of our team who is in the process of provisioning new storage space.  He was running a test involving just 10-K artifacts and had logged more than 1.7 million items.  Combining that with our PROXY extracted artifacts that total jumps to more than two million.

We are having to provision new space and work on a new delivery architecture because our existing system has out-grown both the ability to manage the volume of incoming artifacts as well as the number of outgoing requests.  We are getting ready to add insider trading data to match to the Named Executive Officers and the Directors.  That addition alone will add approximately another million items. We have also been parsing the older 10-Ks into item number sections to conform to our existing availability of the newer ones – that process should add at least another million separate files for download.

Once our new storage space is provisioned and working we will then turn back to finishing a new version of the Search, Extraction and Normalization Engine.  One of the key goals of this project will be to improve your download speed by a factor of at least four.  Our existing architecture did not allow for parallel access to our data repository.  Our new platform will allow us to design the application to run multiple simultaneous connections between your desktop and our repository for data access.