In the weeds again – the following is a bit dense – but details matter.
One of the reasons our application is so versatile is that we have more search operators than any other SEC Filing Search Platform. We have the standard Boolean operators (AND, OR and NOT), We have proximity operators (W/N and PRE/N). We have document operators (XFIRSTWORD and XLASTWORD). And then there is the ANDANY operator.
Here’s the problem – suppose we have a list of the members the Russell 3000 from the beginning of EDGAR until now. We have a set of search terms and we want to do two things. First, for each year determine if the companies in our sample filed a 10-K and then determine the frequency of some bag of words in each filing.
With directEDGAR’s ANDANY operator and CIK filtering tool I can do this in one step. The search I would construct would be ((DOCTYPE contains(10K*)) or (DOCTYPE contains(EX13)) ANDANY (my bag of words/phrases [with appropriate operators]). Because I want to limit the results to those filers identified as members of the R3000 I would have to use a CIK input file.
In this case the ANDANY operator essentially sets the contents of the ANDANY component component of the search as secondary and the documents as primary. This search would first identify all 10-K and Exhibit-13s that were filed by those CIKs in my sample. And then it would also identify the existence and their frequency of each of the words/phrases in each of the documents that were returned.
This is particularly important if we need to use a dummy variable for instance if some of our sample filed a 10-K in this window but did not use any of the words/phrases that were included in the bag of words.
This is different from a very similar search ((DOCTYPE contains(10K*)) or (DOCTYPE contains(EX13)) AND (my bag of words/phrases [with appropriate operators]). This second search will only return 10-K (and Exhibit-13) documents that have at least one of my bag of words/phrases. With this second search I then have to separately determine/identify those cases where my sample filed a 10-K (and therefore did not include anything that matched the search inside the ANDANY operator.
Let’s get further in the weeds and make this a 2 step process. Suppose we only want results if the company was a member of the R3000 for a particular year. For example, Ultra Petroleum (CIK 1022646) joined the R3000 in 2017. However – they have filed 10-Ks since 2002. Further, they were deleted in 2019. In this case, let us assume that we have a list of CIK/years that represent the precise fiscal years that we want to research. Ultra Petroleum is on our list for 2017 and 2018. Given the large number of additions and deletions and the fact that companies can be added, removed and then added back this is a complex list.
Step 1 in this case would be to just do a CIK filtered search for (DOCTYPE contains(10K*)) or (DOCTYPE contains(EX13). This will provide me a list of all 10-K and Exhibit 13s for my CIK sample. I would merge this with my RUSSELL 3000 composition list to identify only those 10-K/EX-13 filings for the CIK-YEAR pairs I want to research. In this case I would end up with 10-K filings for 2017 and 2018 for Ultra Petroleum.
The merged list identifies the exact documents I want to search – we know they exist but to keep it all in the output I would run the same search as above ((DOCTYPE contains(10K*)) or (DOCTYPE contains(EX13)) ANDANY (my bag of words/phrases [with appropriate operators]) – but rather than using CIK filtering I would use CIK-DATE filtering. The first output file has the balance sheet date for each filing – so I would do CIK-DATE filtering using the CDATE (the balance sheet date) as the value for the MATCHDATE parameter. I would set a zero (0) day window using the CDATE (balance sheet date) since we are confident about the existence of these documents (remember – we pulled these in step 1):
By running this 2 step process I have identified the exact 10-K/EX-13 that pertain to the period that the registrant was a member of the R3000. While I could have done this in one step there is a bit of a problem – balance sheet dates change over time for two reasons. First – registrants change their fiscal year-end. Second there are enough registrants with a 52/53 week year that it can get messy (Pepsico reports this in their 2013 10-K: In 2011, we had an additional week of results (53rd week). Our fiscal year ends on the last Saturday of each December, resulting in an additional week of results every five or six years.
Finally, this is the kind of search that requires/benefits from the use of Historical CIKs. When looking at a time-series we can lose observations if we do not account for CIK assignment changes because of entity changes (Google -> Alphabet). By using the Historical CIK option then our input file would be adjusted to include the CIK-YEAR pairs for any associated CIKs. To keep this post from getting any denser – please review this post for an explanation of how the Historical CIK feature modifies the request file (Historical CIK Mapping).