Schedule Support & Tagging Update

I want to reduce the cost/barriers of getting support. To that end we have made a few updates. First, we have an email account that is monitored almost 24-7 because we have some very knowledgeable team members who work in other time zones. If you are stuck and what to see if we can immediately address your problem please send an email to support x@x directedgar dot com. Hopefully my attempt to avoid getting unwanted emails does not confuse you.

Next – how about the ability to schedule some quality time together to either address a specific problem or work on a general strategy for a particular project? I found a tool that allows us to schedule that quality time without too much effort. If you visit this scheduling page you can see my availability and pick a time that works for you. If none of the available times are suitable send me an email directly.

In early March I shared how to access a test index that had additional metadata to enhance your search or to provide more useful context for the results. Our goal was (is) to automate the addition of this metadata to our platform and back fill our older indexes with this data. It has been a process and while I want to get into the weeds with some of the special challenges – boring you is not likely to keep you reading. I will share that for me to be willing to go live with this we established an internal goal. The code had to work on all 10-K filings made in the first quarter of 2013, 2017 and 2021 with no errors. Errors in this case meaning that when an exception occurred – the possible cases for the exception were exhaustively evaluated and if we could not code a resolution then we could label the error in a meaningful way. Further the error cases have to be less than 1/2 of 1% (0.005) of the processed filings.

We finally have code that achieves those standards for Q1 2013 and 2017. We are going to run a test on Q1 2021 in the next several days to confirm that the results hold after a bit more careful error handling. So we are close. I am personally excited about this because I think you should be able to define your search by some of the metadata we are adding to the filings. I have had so many queries about identifying firms with dual classes of stock (see for example the effort described in this paper The Rise of Dual-Class Stock IPOS) – it should be trivial and I think we are going to make it trivial. I have already described that filer status affects disclosure in a number of ways. Size is often used as a proxy but why shouldn’t you be able to directly access filer status since it is the determinate of a registrant’s reporting obligations.

In some ways I am glad this has taken so long because we have had other questions about firm characteristics that we think are worthy to add as metadata. As a result, we have actually been actively collecting some other measures that we are going to include in our new metadata injection.

One critical piece of information is that I determined we cannot safely add some of the additional metadata to 10-K/A. The problem is that registrants are inconsistent about the reference point for the measurement of these values. I have seen registrants report their filer status for the balance sheet date of the financial statements included in the amendment. There have also been cases where their filer status has changed and so even though a scan of the amendment indicates they are following the disclosure regime of their prior filer status they have reported their current filer status on the face of their 10-K/A. There have been cases where registrants report their public float for the end of the second quarter prior to the balance sheet of the included financial statements but there have also been cases where the public float is for a trading date close to the filing date of the amendment. Finally there have also been cases where the public float is pulled from the end of the second quarter of their most recent 10-K and that 10-K was not amended.

There are still a lot more details to share. I will provide a fuller explanation when we move this to production. I am now predicting that we should be adding all of the new metadata to 2021 10-K filings in production in about two weeks. That will give us the insight we need to determine how best to back fill this data to our archives.

Minor Error – Temporary Work-Around

Two days ago a faculty member at Texas A&M reported that they were getting an unexpected error message. They prepared a request file to use the ExtractionPreprocessed feature. If you are not aware – the request files are limited to 20,000 CIK-YEAR pairs. The client reported that they had a request file with 19,999 CIK-YEAR pairs but when they submitted the file the request was blocked and they were getting the dreaded – File Too Large message.

File Too Large Error Message

I asked them to send me the file and I was trying all kinds of tricks to sort out the reason for the error. I failed to ask (or even consider) if they had checked the Include Historical CIKs box. I was focused on analyzing the file and any hidden attributes of the file rather than looking at the problem with a bit more open mind.

Fortunately (for me) Antonis Kartapanis (another TAMU accounting faculty member) was in the email chain and actively paying attention to the conversation. Antonis sent a message suggesting that the issue was caused by the selection of the Include Historical CIKs checkbox. And sure enough – I had not been checking the box, the TAMU faculty member who was having the problem was checking the box. I didn’t think to ask. Antonis tried with the box checked and then with it unchecked.

As a reminder – when the box is checked the application calls home and adds additional rows to the in-memory version of the file if your request files has a successor or predecessor CIK. For example, suppose you used CapitalIQ to create a sample and Alphabet was in your sample (along with many others). The CIK associated with Alphabet is 1652044. Perhaps you are trying to collect Director Compensation data from 2011 to 2021 and so you have 11 lines in the file relating to CIK 1652044.

Request file with Alphabet’s CIK

Once you have selected the artifact you want to pull the application loads your request file, removes duplicates, and if you have checked the Include Historical CIKs checkbox it reviews your file to determine if you have CIKs that need to be augmented. If any are present it first checks to confirm that the predecessor/successor CIK-YEAR pair is not in the file – if not it extends the file with the new pairs. In the case of the request in the image above the application will extend the file by adding new rows for CIK 1288776. The memory version of the file will now have 22 rows of CIK-YEAR pairs.

Extended Request File

Now the application will check the size of the final file. And this was the source of the problem. The augmented file exceeded the 20,000 CIK-YEAR pair limit because of the addition of the predecessor/successor CIK mappings. In a perfect world we would only add the CIK-YEAR pairs that are relevant and remove from the file those that are not. If we were doing that with this file the file would have CIK 1652044 for 2016-2021 and CIK 1288776 for years 2011-2015. (This is on the list but it is a long list).

If you’ve stuck with me so far – I think an easy fix would be to limit your request file to maybe around 18,000 CIK-YEAR pairs per-cycle until we come up with a more elegant solution. I’m so glad Antonis was paying attention. I think I would have beat my head against the monitor for many more hours before I thought to ask the magic question – are you using the Include Historical CIKs checkbox.