I apologize, this post is going to be a bit denser than many of my others as we are going to take a deep dive into a unique feature of directEDGAR. When a registrant has an obligation to file an 8-K the SEC requires them to use a conformed set of reason codes to describe all of the events that are being reported on in the 8-K. A detailed list of the current 8-K filing reason codes can be found here.
One of the driving forces leading to the development of directEDGAR was I was trying to find all reported instances of auditor changes in 2001. While many of these are reported in the business press there was no systematic way to identify all 8-Ks filed with a particular conformed code (say auditor change). Thus from the very beginning we included tagging in the 8-K filing so a user could search for and find all 8-Ks filed for a particular reason code. While I thought that was great it was clear very soon after that many researchers needed to control for all reasons any particular 8-K was filed (there can be as many as nine reason codes attached to any one 8-K filing). We had clients asking how to get all reason codes for all 8-K filings. There was no way to automatically extract those in one step. Rather they had to do independent searches for each filing reason code and then consolidate the final results. This was happening often enough we started maintaining a file that we would make available when someone asked for the codes.
Thus I knew when we developed our new application we needed to make it straightforward for a user to identify all 8-K filing reason codes for any set of 8-K filings the user wanted to extract. I think we accomplished this in a very powerful way. To illustrate I want to model the data collection of the Holder, Karim, Lin and Pinsker (2016) paper Do Material Weaknesses In Information Technology-Related Internal Controls Affect Firms’ 8-K Filing Timeliness And Compliance? I hate to say that Professor Holder had to collect his data the old fashioned way. I hope it helps him to know that his work was one of the inspirations for this added feature in our latest version of the Search Extraction & Normalization engine.
Holder et al. needed to check the filing date relative to the event date and all filing reason codes attached to each 8-K filing. Using 22.214.171.124, I will search for all 8-K filings made in 2016 up to early June. Here is a screen shot of the initial search.
Our SummaryExtraction feature extracts meta data about each document returned from the search as well as all of the document tags we add to each document. This is done from the menu and requires that you specify the location where you want the csv file stored. Here is a screen shot of the SummaryExtraction from this 8-K search.
It is unfortunate that I cannot display the full width of the file. There are 42 columns. Eleven columns identify the filing and include the word count for the filing as well as some meta-data about the filer (SIC-CODE, FYE, CIK, Name). The rest of the columns describe the filing. The RDATE is the date the filing was made available to the public through EDGAR. The CDATE is the Conformed Date. This date is required to be the date the earliest event that is being reported on in the 8-K took place. The focus of the Holder et al. paper was whether or not there were insights about material internal control weaknesses because of the lag between the RDATE and the CDATE.
The remaining columns list all of the possible reason codes in the form ITEM_REASON_CODE. A YES in the column indicates that the reason code was attached to that particular 8-K. At the current time there are 31 possible reason codes for any particular 8-K filing.
Now I want to point out. We have already collected all of the information we need from our population of 8-K filings to replicate the Holder et al. paper. Those two steps – search for 8-K filings and then use the SummaryExtraction feature took a grand total of about 10 seconds.
You might not need to replicate Holder et al. However a number of the requests we have gotten in the past for this data have been because users needed to control for 8-K filing events in an event study. In that case you might want to filter on CIK instead of pulling all events.
(forgive the corny title, it is getting close to Thanksgiving).