Rabbit Holes – Why we don’t use the SEC labeled filing date with directEDGAR

If you’ve dived into directEDGAR you know that we have two key dates for searching filings. One we label the RDATE (R for revealed)- the label is weak in the sense that when I was developing our initial infrastructure I should have called it the DDATE (D for dissemination).

The RDATE is to provide you with some relative certainty regarding the date that the filing became available to EDGAR users. So for example if you wanted to study market response to filings then you would want to know the date the filing was revealed/made available or disseminated through the EDGAR platform. This is actually not the filing date as reported on EDGAR. I have a great example to illustrate this.

For background we are doing some work right now to identify any gaps in our director and executive compensation data. Specifically we ran some code to identify those cases where a registrant is missing one or more years in the time series of the compensation data we have available.

I was double checking the code results in preparation for assigning the data collection to one of our interns. The first registrant I identified was CIK 3906. We had DC data from proxy filings RDATEs from 2007 to 2010 and then we we had a result from an RDATE of 2012. So I presumed that we were missing 2011 and looked to see if I could sort out why we would have missed that particular year. First stop was EDGAR – and I see yes – there was a DEF 14A “filed” in 2007.

So then I switch to my network copy of directEDGAR – I want to sort out why we would have missed this observation and not just collect the data. So I open up the correct folder and I don’t see a folder that looks right. In the image below I would expect a folder in the sequence where the arrow is pointed. The folder above is the PRE 14A (we don’t use these for comp data as too often they will not have the complete data).

When I was comparing our archive with EDGAR I also realized that there were not any proxy filings on EDGAR with a 2012 date – the most recent proxy was filed in 2010.

I’ve been here before, I have gotten emails from clients who have found an occasional 8-K or 10-K filing where the dates have not matched to data they’ve collected elsewhere or matched to the filing date for the forms on EDGAR. I’ve shown them where we pulled our dates from . . . My point to them was that I trusted our dates but I’ve never tried to prove that the filings were not actually available on the filing dates. However, two days ago I was doing some arranging in my office and I came across a copy of directEDGAR that was about ten years old. We did a significant rebuild of our platform beginning in 2015 that was distributed to our customers in early 2016. The software changed and we also did a complete rebuild of the filing archive so we could use our new search engine. You may not recall but one constraint we were dealing with in the ‘old’ days was the two gigabyte file size limitation imposed by 32 bit Windows (64 bit Windows was was very uncommon when we started). This affected the size of the indexes and so our filings were organized in two year folders.

Below is an image from the correct folder – as you can see this version of directEDGAR was created in June 2010.

The filing I am looking for is not there. Another alternative is that I missed that filing when I constructed that version of directEDGAR. However, I am confident that is not the case as we did significant testing to make sure we knew how to capture the filings based on the indexes – that is if a filing was of the type that we wanted and it was listed in the index in that period we captured it.

However if you go to EDGAR right now and access the index (master.idx) for Q2 2007 the filing that is under discussion is listed there.

Does this mean we missed it? No actually the SEC indexes are not static. The EDGAR code platform modifies them frequently. It looks like the Q2/2007 master.idx file (which is the one we use) was last modified in September 2014.

When we do an update we don’t actually pull the latest index. Instead we pull all of the indexes (all the way back to 1993) and compare the new indexes to the last version of the index we have stored in the cloud and then we pull those filings that are not listed in the previous/archive index (no matter the stated filing date). My recollection is that the last update we did there were more than 3,000 filings that have an SEC ‘FILING DATE’ before 12/31/2019 that we had to capture and add to directEDGAR because they were not listed in our last comparison of indexes.

So back to the punch line we were actually not missing any compensation data for this filer. We had comp data from a 10-K/A that was filed in 2007 (RDATE 20070809 which also matches the filing date). The 2012 RDATE comp data we have correctly reports the YEAR (2006) that the data relates to. We have the complete time series of DC data for this registrant so we just have to delete the as reported with RDATE equal to 2012 since it duplicates the data included in the 10-K/A that was disseminated earlier. The filing in question was not available until 2012.

Of course the question is – why was this filing not disseminated until 2012. While I can’t fully answer that question – our research indicates that it is often the case that when one or another of the filings made by the company is under review by the SEC. In this case it seems that the PRE 14 was reviewed and the registrant responded to the points raised by the SEC in a letter associated with the DEF 14A. I can’t conclude anything more. However, I am very confident that the filing was not disseminated until 2012. If I do a search in directEDGAR PROXYMASTER I can find more than 200 filings with an RDATE more than 1,000 days greater than the CDATE (normally on proxy filings the RDATE is before the CDATE). When I spot check these (only 3) I see exactly what I saw with the filings from CIK 3906. Our CDATE matches, the filings were not available on the older version of directEDGAR and there is CORRESP included with the filing.

In summation – EDGAR is a ‘living’ thing. As I noted earlier – when we distributed the last update we identified more than 3,000 filings that were listed in previous indexes but were not in the version of the indexes we accessed in early January.

Data Delay – Interesting Problem

I was fully expecting to begin making the Director-Relationship data available by now. However, we have run into some really interesting problems that we are having to sort through. We made an assumption that there was a one-to-one relationship between a Central Index Key and a person when a person has a SEC reporting obligation. However, as we were aggregating our director data to organize it for the relationship data presentation our data guru (Manish Pokherel) discovered this was not true.

Manish was trying to create various integrity tests before we made the final merge and in one of the scenarios he tested he discovered that there are approximately 40 people who have multiple CIKs. Here is a screenshot of the SEC landing page for Dr. Glimcher (who was on the board of Bristol-Myers Squibb from 1997 to 2017).

Gimcher SEC Landing Page

Clearly these look to be the same person – if you follow the links and read her biographies in the related filings it becomes clear that yes, Dr. Glimcher ended up with two unique CIKs.

The problem is that we have one CIK associated with some instances of her compensation (and ownership data) for some filings and the other CIK associated with other instances. For the compensation data and the relationship data to have the most value we need to standardize it.

The decision we made last night is that we are going to use the most recent CIK of these individuals. This means we have to go back through the compensation data and replace any instances where the older CIK value is included as the PERSON-CIK. I will observe that other cases of this are not as clear cut as Dr. Glimcher’s.

This has really been an interesting exercise. This is the first time we have pulled all of our compensation data at one time and tried to do some deep analysis. All of our previous integrity analysis has focused on one individual company and a fairly limited time series at a time. We have over 69,000 unique directors identified (NAME-PERSON-CIK). So as you can imagine it is a special challenge to find ways to cross validate the data.

Bottom line is we need to do some more testing – not too much more but we are still trying to identify ways to make sure this resulting data is clean. We also have to sort out how to make sure we propagate a specific CIK for a person through our system. I want to make sure that when you download our ownership transaction data, director votes data, our beneficial ownership data (I can’t remember where else we use the PERSON-CIK) you get clear links across time and between entities.

Complete Redesign of directEDGAR’s Delivery Modality – 10 Beta Testers Needed

Yesterday we inaugurated our first substantial test platform for our new delivery system. After the January 2021 update there will not be any more updates delivered through the mail. Instead we are using some technology from Amazon to host and deliver our application in the cloud. We expect to begin transitioning clients to this platform on a voluntary basis early in the 4th quarter and while we will make the final 2020 update available to allow those that want to transition a bit slower – that will be the final update.

The AWS Appstream service is amazing and provides us the opportunity to improve the timeliness of filing access (think of near immediate) as well as relieve your IT staff of their responsibility for local management of directEDGAR. The best thing about this change is that we do not have to impose any of the limits that come from a web based search. We can continue to provide the absolute best search experience and make the platform available to you anywhere – anytime.

I am looking for 10 beta testers from our users. If you are interested please send me an email. As a beta tester you can help us make sure we have some good user feedback on the experience. I will note I am already excited about this experience – searching and rendering is about 40% faster than on my local computer. The people who will get the most out of this experience initially will be those who need access to the most timely 10-K and DEF 14A filings. I expect the full directEDGAR content to be available by the middle to the end of this month.

We also need some advice or suggestions regarding the addition of metadata to our filings. Right now we include the SEC dissemination date (RDATE) as well as the conformed date (CDATE) and the item codes for each 8-K filing. We include word counts and issuer details like SIC code, FYE. We also provide doctype that drills down to the conformed exhibit code. I am intending to add the actual filing date/time stamp (most filings made on Friday after 5:30 have a Monday RDATE as they are not disseminated until 6:00 AM on Monday. I also want to add filer status (LARGE ACCELERATED, ACCELERATED etc) Is there anything else that you think should be added? If you remember – the metadata provides additional filtering opportunities. The idea is we can search documents for search phrases and words and then additionally filter on the metadata to provide even more focused results.