More MetaData to improve your searches coming soon.

I have just completed a major rebuild of the programs I use to extract the filings from the SEC website and process them before they are added to the filing collection.  This rebuild has given me the flexibility to add more metadata to improve your searches.   I plan on adding SIC codes and FISCAL YEAR END.  This will improve your filtering speed if you want to include or exclude filers from your search results based on one of these characteristics.  I would appreciate suggestions about other characteristics you would like to see added to the filings to allow you to more efficiently specify a population of search results.

Please let me know when you post a paper to SSRN

I intend to add links to papers that mention the use of directEDGAR posted on SSRN.   This helps potential customers identify who is using directEDGAR and what they are using it for.  Thus, if you post a paper to SSRN I would appreciate an email letting me know you have done so.    I will take care of finding the paper and making the link.

ExtractionEngine Enhancements Coming Soon

A new version of the ExtractionEngine will be released soon.  I hope to be able to deliver it with your Second Quarter update.  The new version will include an enhancement to the SmartBrowser and a specialized tool to allow you to identify filing events around specific windows.

The SmartBrowser is the html viewer that appears when you select the Review Tables item from the Pre-Process menu.  When you start the SmartBrowser in a particular directory it builds a queue of all of the htm and txt files that are in the directory.  The first one is displayed and then you cycle through the queue using the next button.  Right now there is no easy way to jump to a particular filer’s tables if you need to check a specific data value or table heading.

I have added a feature that will allow you to type in a CIK number and the SmartBrowser will load the files beginning with the CIK number that you have entered.  This should save you some time when you have a large collection of snipped tables but only need to review a small number.

The EventExtraction tool will allow you to easily identify any SEC filings that were made in a window around a specific filing date.  You get to specify a unique event date for each CIK number and specify the length of the window.  The program will create a CSV file that will list all dates in the window that you specified.  Next to each date will be a list of all SEC filings made on that date.  If no filings were made on a particular date the output will indicate that no filings were made.

Below is an example of the output from checking for SEC filings in an eight  day window  around 12/31/2001 for CIK 1128709.  Notice that the days the SEC is not accepting filings (12/22;  12/23; 12/25; 01/01; 01/05; 01/06) are not included in the output.  The program builds the window based on available filing dates, dates the EDGAR system is not accepting filings are not included in the window span.

‘1128709’, ’12/18/2001′,No Filing Events on this Date
‘1128709’, ’12/19/2001′,No Filing Events on this Date
‘1128709’, ’12/20/2001′,No Filing Events on this Date
‘1128709’, ’12/21/2001′,No Filing Events on this Date
‘1128709’, ’12/24/2001′,No Filing Events on this Date
‘1128709’, ’12/26/2001′,No Filing Events on this Date
‘1128709’, ’12/27/2001′,No Filing Events on this Date
‘1128709’, ’12/28/2001′,No Filing Events on this Date
‘1128709’, ’12/31/2001′,’424B3′, ‘8-K’
‘1128709’, ’01/02/2002′,No Filing Events on this Date
‘1128709’, ’01/03/2002′,No Filing Events on this Date
‘1128709’, ’01/04/2002′,No Filing Events on this Date
‘1128709’, ’01/07/2002′,No Filing Events on this Date
‘1128709’, ’01/08/2002′,No Filing Events on this Date
‘1128709’, ’01/09/2002′,No Filing Events on this Date
‘1128709’, ’01/10/2002′,No Filing Events on this Date
‘1128709’, ’01/11/2002′,No Filing Events on this Date

The second program does something similar but is limited to 8-K filings and their reasons.  It will build an output listing all 8-K filing that took place in your event window and the reasons for the filing.    The output from this program using CIK 104327, specifying an Event Date of 06/18/2002 and requiring a three day window was:

‘1043277’, ’06/13/2002′, ‘No Filing Events on this Date’
‘1043277’, ’06/14/2002′, ‘No Filing Events on this Date’
‘1043277’, ’06/17/2002′, ‘No Filing Events on this Date’
‘1043277’, ’06/18/2002′, ‘No Filing Events on this Date’
‘1043277’, ’06/19/2002′, ‘No Filing Events on this Date’
‘1043277’, ’06/20/2002′, ’06/18/2002′, ‘AUCHANGE’, ‘FINANCIALSTMTS’
‘1043277’, ’06/21/2002′, ‘No Filing Events on this Date’

SEC Index Problems

I am working on a new feature (EventExtraction Tool) to add to the main ExtractionEngine program.  To test the code I used the new tool and pulled the dates and filing types for GE from the database I built and compared it to the list of filings from GE’s SEC filing page.  The listing from my database was about 60 filings short when compared to the SEC filings list.

If you can imagine I started panicking because I couldn’t identify why the count differed so substantially  since this would clearly have implications regarding the claims I have made for the completeness of directEDGAR’s main filing repository.  After checking every possible reason for the difference due to problems with my coding I finally decided to compare the index list to what was in my database as well as to the actual filings.  That is where the problem turned out to be.

The quarterly SEC indexes are incomplete.   I have reached out to my contact at the SEC regarding the discrepancy and they are working on a fix.  The filings that seem to be missing from the indexes most often are those that have a filing date in the header file that is different from the actual date the final document was submitted to the EDGAR system.

The most common type of filings that seem to be missing are UPLOAD, CORRESPONDENCE and S-4 registration filings.  It looks to me that more than half of the CORRESPONDENCE and UPLOAD types are missing.   Less than 1/2 of one-percent of 10-K, 10-Q and 8-K filings are missing.

I intend to do a complete rebuild of directEDGAR to add more metadata.  I will use the new indexes for the next build.