Questionable Value of SGML tag value in SEC Filings

I have posted before about our decision to use the RDATE rather than the ACCEPTANCE-DATETIME value from the accession-number.txt or accession-number.hdr.sgml files.  We use the RDATE (note that RDATE is a term we assigned to the date associated with the <SEC-DOCUMENT> tag in those same files).  The difference is critical because academic researchers need to identify the best event date for event studies.  Selecting the wrong date could at best introduce excessive noise into the calculation of abnormal returns and at worst could lead to a bias in the results.

While we have the ACCEPTANCE-DATETIME stamp for all the filings we use the date that is associated with the SEC-DOCUMENT tag because it is a better measure of when the filing was actually made available through the EDGAR filing system so a user could read and then act on the information.

To illustrate this I took a screen-shot of the Latest Filings Page with 10-K filings listed at about noon on Saturday April 8.  Here is the screenshot:


If I had tried to collect all 10-K filings made through EDGAR at noon on 4/8/2017 the most recent filing I would be able to access would be the 10-K filed by Plastic2Oil.  According to the SEC this was accepted at 17:30 on 4/7/2017 and has a filing date of 4/7/2017.  When this filing was added to directEDGAR about 4:45 (CDT) on 4/7/2017 we assigned the RDATE value R20170407.

I checked in again Sunday 4/9/2017 about noon and the same list was available.  Then I checked midday Monday 4/10/2017 and the list had been updated to reflect all the filings that were made after the SEC cut-off (17:30 M-F excepting Federal Holidays) (as well as filings made Monday morning).  Here is the updated list:


   There are seven filings that were not visible or accessible to EDGAR users until probably the first push of the RSS index at about 5:00 AM on 4/10/2017.  I checked our logs and see that we pulled those particular filings at about 5:15 AM on 4/10.  They were not available for our final pull Friday at about 9:00 pm and they were not accessible during our weekly clean-up run on Saturday where we validate everything that was filed during the week.

Our competitors insist up using the ACCEPTANCE-DATETIME value as the critical event date.  That has never made sense to us because of all of the issues that can affect the length of time between when the filing is submitted (ACCEPTANCE-DATETIME) and pushed to EDGAR users (SEC-DOCUMENT) date.  In this example the lag is caused by the SEC enforcing their cut-off rule.  However, the lag can also be affected by problems with the composition or organization of the filing.  That is the registrant can submit a filing and it fail a validation test.  The SEC may allow the registrant to retain the initial ACCEPTANCE-DATETIME value since that has regulatory consequences but the filing is still not available to the public through EDGAR until sometime after the filing has been corrected.

For the 10-K filed by Robin Street Acquisition Corp from the screenshot above the header reports that it was made at 19:12 on 4/7.  We assigned it an RDATE of R20170410.


Internally we refer to the RDATE as the Reveal Date.  We think it is a better value to use in an event study.  So then the question is how do you get the RDATE for a collection of filings?  Since it automatically included in the summary result set from a search that is not too difficult.  The following image shows the summary results file after  searching for the phrase “poison pill” in 8-K filings.


There is quite a bit more meta-data about each of the results but this shot lets you see that the RDATE is automatically parsed and ready (as well as the CONFORMED DATE – CDATE).

Using directEDGAR to Enhance my Teaching

I teach Intermediate I.  This is a tough class to keep students focused on the issues.  I believe one of the problems is that the students have little business experience and they have a hard time connecting most of what we are covering to real life business circumstances.  For example, last week we covered the measurement of inventories.  One important Learning Objective in this topic is for students to be able to demonstrate an understanding of what items should be included in the inventory at the end of the period.  Inventory (purchases or sales) in-transit as well as on-consignment have to be considered.  When I talk about consignment most students start tuning out because to them that doesn’t seem relevant.

I decided to check to find some concrete examples of disclosures relating to consignment issues in recent 10-K filings using the Search, Extraction & Normalization Engine.  The search illustrated below is for all documents that are 10-K filings and have some form of the word root consign.


We require the TI-BAII calculator so all of the students are at least familiar with Texas Instruments.  Their disclosure reports that 55% of their revenue comes from sales on inventory that is held on consignment by their customers.  The nice thing about this is it provides a nice segue into a discussion about how critical integrity is to successful business relationships.

Another interesting disclosure that students could relate to was made by Sirius XM Holdings.  Here is a screen shot of their disclosure:


Many students have satellite capable radios in their cars and like most people (including me) have probably not thought about the supply line from the auto manufacturer to Sirius and how that radio was sourced.  Again – this disclosure provides some substance for a more interesting class disclosure about what students have historically seemed to think is a real non-issue.  I could tell students were paying more attention when I related this to the problem we were working.

Finally, probably the most interesting disclosure I found about consignment sales was from Calavo Growers Inc (CIK: 1133470).  Here is their disclosure:


This was interesting for me as I would never have imagined that kind of relationship with respect to the sale of a perishable product.  Does this mean that payment to the grower for that bag of avocados I bought at Costco was dependent on my purchase?

While our platform is designed for intense data collection for research and analysis – one of the clear side-benefits is the opportunity to bring timely disclosures to class to link the issues covered to real world business problems.

Interesting Results from Director Voting

I was testing some of our preliminary results of the extraction and normalization of the results of Director Votes.  I struggled for a bit about how to present this.  Initially I was going to identify some voting results that were strongly negative with the names of the directors and the companies but I thought about it a bit.  I guess if John Smith is having some professional issue that is causing him to receive a high percentage of negative votes I don’t think I want his curious kids Googling his name and find our website recounting the struggles of their dad.  Thus the analysis below does leave out the names of the individual directors.  However, when you do download the actual director vote data we include their name as reported in the filing, their name as it is reported in their personal ownership filings and their personal CIK as illustrated in the image below:


(While names are included above – they are embedded in an image so I feel like we are not imposing on anyone’s privacy).

So the above image illustrates many of the details of our vote data lets at least superficially dive into some interesting sorts on the votes.  I should note that this was an initial sample of good results from 2,959 companies in the Russell 3000 who reported voting results in an 8-K from 1/1/2015 to 12/31/2015.

Most negative votes.  I decided to sort the vote results by the proportion that were negative (sum of AGAINST, WITHHELD and ABSTAIN) (AWA) divided by (sum of AGAINST, WITHHELD and ABSTAIN plus FOR).  I was sort of amused by this initial sort as their were two candidates for election to the board of General Motors who received a minuscule number of affirmative votes and more than 1.2 billion votes against.  Here are the results


Neither of the gentlemen listed were listed in GM’s Proxy.  It took a bit of research to sort out that these folks offer themselves as candidates at GM’s Annual Meeting each year.   My guess from looking at the votes and the BROKER_NON_VOTES (BNV) is that these gentlemen only vote for themselves as the difference in the BNV is the difference in their votes for.

Putting that anomaly aside there were 396 companies that had one or more directors whose proportion of  AWA votes was 20% or more.  182 companies had more than one director with a negative vote total grater than 20%.  News Corp led the pack as each of the 12 directors put forth by the company received a negative vote of 20% or more.   Here is a list of companies that had 5 or more directors up for election that received a 20% or greater negative vote.


Because we can sort on Gender I discovered that 90/851 recipients of 20% or greater AWA votes were women.  This approximates their representation in the overall collection of directors (18,193 total directors for the 2,959 companies and of the total 2,576 were women candidates).



Minor Bug Duplicate Rows in Request File

I was testing the downloading of our new Director Vote results data and discovered a new bug.  Yuck.  If your request file has 2 (or more) rows with the same CIK, YEAR and PF values the application will exit as if everything is ok but will not consolidate the data rows and will not complete the extraction from our server.  I apologize for this, I only discovered this accidentally today.  Since we are preparing an update to ship to your IT folks in the next week or two this should be corrected by then

Proxy Voting Results Available Soon

This past year we have fielded a number of requests for help extracting Proxy voting results.  The results of director elections are pretty easy to access with the TableExtraction and Normalization routines built into the Search, Extraction & Normalization Engine.  The other votes are more difficult because the votes are not usually labeled in the table, rather they are described in the preceding paragraph.


In the image above the votes relating to compensation and the approval of the auditor are described in a paragraph independent of the table itself.  Our clients who have attended our Python Bootcamps would be able to use some of the tricks we have shown them to identify the table of votes but the logic to extend the Normalization component of the Engine is tricky to implement in the Engine and we have not yet sorted out the complications.

Thus we have been working on adding these votes to our Preprocessed Data Feed.  Initially we will add the Director vote results but soon after we will also add the Advisory Vote on Compensation, the Approval of the Auditor and the results from the vote on the Frequency of an Advisory Vote on Executive Compensation.  Once those are addressed we will add the votes on shareholder proposals.  With the director votes we are adding their CIK as well as their Gender.  Here is an example of the output you can expect with the Director Vote results.  These results came from an 8-K filing made by Abbott Laboratories on 5/4/2016 that reported on the actions taken during their annual meeting on 4/29/2016 (these data points are included in the results we push to our clients).  The PERSON-CIK value is the CIK assigned to that individual by the SEC if they have ownership or other reporting requirements.


One of the advantages of including the director CIK as an identifier is it simplifies tracking directors (and officers) across entities.  While I was playing with the results I decided to see how many directors in one of our test files had multiple directorships.  This is not so easy to determine if we had to sort by name.  For example,   Mr. Maffei (who held directorships in six entities in our test sample) is  reported as Greg Maffei and Gregory B. Maffei.  When we can sort by PERSON-CIK identifying multiple directorships is more straightforward.


Our plan is to build the archive but also add this to our daily processing.  We fully expect that by the time the 2017 Proxy Season begins in earnest we will be able to push the vote results to our distribution server within minutes of the filing of the 8-K that reports the voting results.

The initial batch of the results of director elections will be available within the next two weeks.  While I will announce it here you will be able to tell what data is available by creating a request file per the instructions and look at the list of available Governance Data Tables.  The Director Voting results will be listed in that schedule of available Data Tables




Accessing 8-K Filing Reason Codes is as easy as Pie

I apologize, this post is going to be a bit denser than many of my others as we are going to take a deep dive into a unique feature of directEDGAR.  When a registrant has an obligation to file an 8-K the SEC requires them to use a conformed set of reason codes to describe all of the events that are being reported on in the 8-K.  A detailed list of the current 8-K filing reason codes can be found here.

One of the driving forces leading to the development of directEDGAR was I was trying to find all reported instances of auditor changes in 2001.  While many of these are reported in the business press there was no systematic way to identify all 8-Ks filed with a particular conformed code (say auditor change).  Thus from the very beginning we included tagging in the 8-K filing so a user could search for and find all 8-Ks filed for a particular reason code.  While I thought that was great it was clear very soon after that many researchers needed to control for all reasons any particular 8-K was filed (there can be as many as nine reason codes attached to any one 8-K filing).  We had clients asking how to get all reason codes for all 8-K filings.  There was no way to automatically extract those in one step.  Rather they had to do independent searches for each filing reason code and then consolidate the final results.  This was happening often enough we started maintaining a file that we would make available when someone asked for the codes.

Thus I knew when we developed our new application we needed to make it straightforward for a user to identify all 8-K filing reason codes for any set of 8-K filings the user wanted to extract.  I think we accomplished this in a very powerful way.  To illustrate I want to model the data collection of the Holder, Karim,  Lin and Pinsker (2016) paper Do Material Weaknesses In Information Technology-Related Internal Controls Affect Firms’ 8-K Filing Timeliness And Compliance?   I hate to say that Professor Holder had to collect his data the old fashioned way. I hope it helps him to know that his work was one of the inspirations for this added feature in our latest version of the Search Extraction & Normalization engine.

Holder et al. needed to check the filing date relative to the event date and all filing reason codes attached to each 8-K filing.  Using, I will search for all 8-K filings made in 2016 up to early June.  Here is a screen shot of the initial search.


Our SummaryExtraction feature extracts meta data about each document returned from the search as well as all of the document tags we add to each document.  This is done from the menu and requires that you specify the location where you want the csv file stored.  Here is a screen shot of the SummaryExtraction from this 8-K search.


It is unfortunate that I cannot display the full width of the file.  There are 42 columns.  Eleven columns identify the filing and include the word count for the filing as well as some meta-data about the filer (SIC-CODE, FYE, CIK, Name).  The rest of the columns describe the filing.  The RDATE is the date the filing was made available to the public through EDGAR.    The CDATE is the Conformed Date.  This date is required to be the date the earliest event that is being reported on in the 8-K took place.  The focus of the Holder et al. paper was whether or not there were insights about material internal control weaknesses because of the lag between the RDATE and the CDATE.

The remaining columns list all of the possible reason codes in the form ITEM_REASON_CODE.  A YES in the column indicates that the reason code was attached to that particular 8-K.  At the current time there are 31 possible reason codes for any particular 8-K filing.

Now I want to point out.  We have already collected all of the information we need from our population of 8-K filings to replicate the Holder et al. paper.  Those two steps – search for 8-K filings and then use the SummaryExtraction feature took a grand total of about 10 seconds.

You might not need to replicate Holder et al.  However a number of the requests we have gotten in the past for this data have been because users needed to control for 8-K filing events in an event study.  In that case you might want to filter on CIK instead of pulling all events.

(forgive the corny title, it is getting close to Thanksgiving).






Custom Indexing with directEDGAR

One of the common requests we get from our clients is that they want to search just specific sections of 10-K filings.  The most common sections they look to index are the Risk Factors (Item 1A) and the MD&A (Item 7).  With our release of this is now possible.  You have to download the item sections to your local computer and then you have to run the indexing software built into the Search Extraction & Normalization Engine.

The process to download the various Items from the 10-K to your local computer is described in the Help file with the Search Extraction & Normalization engine.  Briefly, you need a CSV file with a list of CIKs, the years and the period focus (PF).  I am going to cheat in this example and use the list of most recently filed 10-Ks to generate my request file.


You can see my request file in the image above.  However, to show off a bit I also included a screenshot of our internal application that tracks the handling of each filing.  The image above is a bit fuzzy – the first CIK in the list is Monsanto (1110783).  If you go to EDGAR and check for the filing date and time of their 2016 10-K you will see that it was filed on EDGAR on 10/19/2016 at 3:31 CT ( 4:31 ET).  The details in our log above tell me that it was pushed to our server at 3:53 CT (4:53 ET).  So this Risk Factors section was available for our clients in less than 30 minutes after the filing was made on EDGAR.  Pretty amazing.

Once the request file has been created, the application interface has the controls to use to query our server and pull from it the actual Risk Factors sections for the list of CIK – Year pairs.


The process runs very fast, typically we can push more than 500 files a minute to your local computer.  However, time of day, our network and your network load all affect the download speed.  When the process is complete all the risk factor sections for your CIK-YEAR pairs are in the directory you specified.  They are named using the standard notation we use with the CIK, the date the filing is made available on EDGAR and for these particular filings the balance sheet date.  Each of these files can be individually viewed using our SmartBrowser.


As you can see we have controls to allow you to cycle through these quickly.  However, if you in-fact want to have the full range of directEDGAR features to use with these files they need to be indexed.  The indexing process requires that you select a collection of files and specify a destination for the files.  The application will handle all of the intermediate steps.


When the indexing is complete the application automatically adds the index to a directEDGAR_CUSTOM index library and that library is then selected as the active library. So it is only necessary to select the index and begin searching.


The search results will have the hit highlighting and hit-to-hit navigation features.  You can extract context around search results, extract tables, get word counts – in short all of the features associated with the primary indexes are made available on these custom indexes.  And while I ran a simple search above, you can run as complex a search as you need because you have the full power of our search engine managing the background processes.