One of the key drivers of our new architecture was to allow us to more easily expand the range of data values we deliver through the Search, Extraction & Normalization Engine. We are experiencing that benefit right now – we just started delivering normalized data from Form 3, 4 & 5 filings.
The SEC lays out the obligations of officers, directors and other Section 16 persons to report their holdings and transactions in their company’s securities and also derivative instruments where some security of the company is the underlying value determinant. The reporting mechanism that is used today is either Form 3 (initial report of holding); Form 4 (transactions and other events that affect holdings) and then Form 5 (annual statement of holdings). In about May 2003 these forms were made machine readable when the SEC introduced/required them to be filed using both an HTML and XML format. As an aside prior to the introduction of the XML form – these forms were only available through EDGAR from a limited number of companies – most filers choose to file a paper copy.
So this morning we released a pilot test of normalized data from these forms. This has been a big undertaking (today there are more than five million ownership forms available through EDGAR). Our initial pilot is focused on the S&P 500 for the period from 2013 to 2017.
One of the big struggles about this data was deciding how to organize it for you to call. What we decided to do was prepare the data by COMPANY-CIK /YEAR. So if you submit a request for Abbott Labs (CIK 1800) for 2017 we will deliver back the data extracted from every Form 3/4 and 5 filed by any person who filed one of those forms during the period from 1/1/2017 through 12/31/2017.
Each line of the results represents one reporting event for one person. So if a reporting person describes 3 non-derivative and 2 derivative transactions in one filing the result file will have 5 lines – each line reports all of the values included in the form.
To access this data create a request file (three columns, CIK, YEAR and PF) and from the Extraction menu select ExtractionPreprocessed. Once the request file has been validated and you select the Read Input button the Pull section will populate with the latest list of available tables. The parsed ownership data is delivered when you select SECTION_16_ANNUAL_SUMMARY
There can be as many as 150 unique column headings in the result file – depending on the number of footnotes that are included for the transactions. This is critical we attach any relevant footnote to the transaction the footnote is providing elaboration for.
I will be honest – our labeling for the footnotes is a bit tedious – but we think necessary to provide clarity as to what part of the form the footnote should be considered with. For a little hint at these complications consider this image from a partial Form 4 filed by John Klinck an EVP for State Street (link to full form).
There are two footnotes to explain/elaborate on the value reported for the Amount of Shares Beneficially Owned Following Reported Transaction. Note that this entry describes a non-derivative transaction. These footnotes are indicated in the following manner – the text that is associated with the footnote with the (2) indicator is in a column labeled transactionshares.footnote. The footnote indicated by the (3) value is labeled transactionshares.footnote_1. In short we are adding an index value to all footnotes associated with a data entry after the first footnote for that particular data entry. If an entry in a form has 4 footnotes then the last one listed will be indexed with a _3. Footnotes are associated with a specific data entry and are thus keyed to that data value and are reported in the row that that data value is reported.
These forms allow the respondent to include REMARKS. A remark applies to the entire form and so what we ended up doing is including these in a separate row. Initially we thought to include them next to each transaction but decided that these might be more useful if they could be easily isolated. We include a column that describes the nature of the content for each row (datatype).
This allows you to very quickly isolate and review any particular type of data. All of the identifying information for each person associated with each form is included in each row. There are indicators for the relationship between the reporting person and the issuer (isdirector/isofficer/isother). The reporting person’s CIK is included so you can match back to our compensation data.
I am really excited about this update. If you work with this data and have observations that would help us improve the utility please send me an email. (burch [yada] directedgar.com).