Two days ago a faculty member at Texas A&M reported that they were getting an unexpected error message. They prepared a request file to use the ExtractionPreprocessed feature. If you are not aware – the request files are limited to 20,000 CIK-YEAR pairs. The client reported that they had a request file with 19,999 CIK-YEAR pairs but when they submitted the file the request was blocked and they were getting the dreaded – File Too Large message.
I asked them to send me the file and I was trying all kinds of tricks to sort out the reason for the error. I failed to ask (or even consider) if they had checked the Include Historical CIKs box. I was focused on analyzing the file and any hidden attributes of the file rather than looking at the problem with a bit more open mind.
Fortunately (for me) Antonis Kartapanis (another TAMU accounting faculty member) was in the email chain and actively paying attention to the conversation. Antonis sent a message suggesting that the issue was caused by the selection of the Include Historical CIKs checkbox. And sure enough – I had not been checking the box, the TAMU faculty member who was having the problem was checking the box. I didn’t think to ask. Antonis tried with the box checked and then with it unchecked.
As a reminder – when the box is checked the application calls home and adds additional rows to the in-memory version of the file if your request files has a successor or predecessor CIK. For example, suppose you used CapitalIQ to create a sample and Alphabet was in your sample (along with many others). The CIK associated with Alphabet is 1652044. Perhaps you are trying to collect Director Compensation data from 2011 to 2021 and so you have 11 lines in the file relating to CIK 1652044.
Once you have selected the artifact you want to pull the application loads your request file, removes duplicates, and if you have checked the Include Historical CIKs checkbox it reviews your file to determine if you have CIKs that need to be augmented. If any are present it first checks to confirm that the predecessor/successor CIK-YEAR pair is not in the file – if not it extends the file with the new pairs. In the case of the request in the image above the application will extend the file by adding new rows for CIK 1288776. The memory version of the file will now have 22 rows of CIK-YEAR pairs.
Now the application will check the size of the final file. And this was the source of the problem. The augmented file exceeded the 20,000 CIK-YEAR pair limit because of the addition of the predecessor/successor CIK mappings. In a perfect world we would only add the CIK-YEAR pairs that are relevant and remove from the file those that are not. If we were doing that with this file the file would have CIK 1652044 for 2016-2021 and CIK 1288776 for years 2011-2015. (This is on the list but it is a long list).
If you’ve stuck with me so far – I think an easy fix would be to limit your request file to maybe around 18,000 CIK-YEAR pairs per-cycle until we come up with a more elegant solution. I’m so glad Antonis was paying attention. I think I would have beat my head against the monitor for many more hours before I thought to ask the magic question – are you using the Include Historical CIKs checkbox.