It is never simple – We don’t know what we don’t know!

I had a PhD student ask a really interesting question last week. Because I don’t want to disclose their research goals it took me a bit of time to come up with a good analogy. They had a search that had more than 100 search terms. They did a summary extraction and was scanning the summary file with the columns that list the terms and then the number of hits found in each document. They would then periodically look at the source document in our viewer to check how their words/phrases were actually used in context. The problem they identified was that they started to see cases where their search term was in the document but it was not used in the right context.

So my example is going to be – suppose I want to find all 10-K filings with mention of Texas. I believe that if the word Texas is reported in the 10-K that provides strong evidence that the company has operations of some sort in the state. However, once I start scanning the results I find plenty of cases where the word TEXAS is in a filing – but the problem is that the word TEXAS is used as part of a noun phrase or other construction that does not actually name Texas as a location of operations. For example, West Texas Intermediate is a benchmark used for pricing oil transactions. Mentions of Texas Instruments or Texas Pacific as a competitor. So the question is – if we don’t know the context of the word in use – how can we be sure the word is actually signifying what we hope it signifies? In other words, the existence of the word may not be sufficient evidence that the instance in meaningful in our case. Further, we do not know in advance all of the possible noun phrases and proper names that include the word TEXAS so we can’t exclude them or account for them in our search. (If you know all of the proper nouns and noun phrases to exclude in advance then modify search to account for those with the ANDANY operator TEXAS andany (TEXAS INSTRUMENTS OR WEST TEXAS INTERMEDIATE OR . . .).)

Here is an image of the search results, the filer is TEJON RANCH CO – the name has a Texas sort of ring to it but they are a real estate development and agribusiness company. They realize some royalties from mineral right leases on their land – which is in California. They appear to have no operations in Texas, but the royalties they receive seem to be tied to West Texas Intermediate.

One way to help identify and get a better handle of how TEXAS is used in a document is to do a ContextExtraction (this is the label we use on the platform) and set a really tight span. In this particular case I suggested to the PhD student that they set a span of 1 ‘Words’ as illustrated in the next image.

Setting ContextExtraction Span

By doing so and scanning the results it becomes clear that there is a lot of noise in assuming just the mention of the word TEXAS in a 10-K filing is meaningful. We find cases where Texas Instruments is mentioned as a competitor. There are cases where Texas A&M and other universities with the word Texas in their name have a patent relationship with the registrant or one of the executive officers earned a degree at the university. Restricting the context to one word may not be the best choice in every case. That is okay because it is cheap to rerun and alter the span to test alternative strategies.

The point is that there is no way I could have known in advance all of the ways the word Texas might be used in the filing and be confident that the use of Texas was evidence of corporate activity in Texas. But by extracting the limited context and scanning it I can more confidently look for ways to better measure evidence of business activity.

I will disclose that in our exchange, they were wondering if this was the point they needed to start learning Python. I do encourage folks to start learning Python – but this is not a problem well solved by Python. We had the context around the word TEXAS from every 10-K filed from 2016-2020 in a csv file about four minutes after we started. Now it is going to take some effort to learn what should be included or excluded to make sure their measure reflects what they hope it reflects. Being able to look at these results is what is going to give them the understanding they need to move forward.

Examples of Texas Results that are Likely Noisy

Random – It is about the people. We have amazing interns!

I often tell people that my day job is about the best job in the world. Every semester I get to meet some outstanding young people who are just starting to make their mark on the world. I love class when someone challenges me and asks hard questions. My role here also gives me the same opportunity – we hire late juniors and seniors in high school and try to keep them as long as we can. What I am looking for is a little bit of arrogance (confidence), a little tiny bit of humility and a lot of curiosity and persistence – no experience necessary or really even wanted. Most importantly, I am looking for integrity. They work remotely and I don’t want to invest in monitoring systems and I need them to quickly report when they make an error.

Our training is pretty ad-hoc and is initially focused on helping them learn the importance of details. We have tools that we use to identify, extract and normalize executive and director compensation. If we knew everything about the way the data is going to be presented in the filings we wouldn’t have to use humans because we can address the issues in code. But there are a lot of nuances that get added each year. Some days it feels like we are playing whack-a-mole with choices registrants make, We used to believe that II was part of the name in a Name/Title cell – but today one company used that in the title. I have gone too much in the weeds. The point is we need our interns to be really curious and questioning when they are looking at details.

They start off doing tasks that keep our processes running and learning how to question everything they see in one of our dashboards (does that II really belong in the title). At some point we start teaching them Python. When we start teaching them to code – the focus is on learning how to break tasks into the smallest possible step. It takes roughly a year before they are proficient enough to start making independent contributions to our code base. I will give them some goal and when they ask questions I try mostly to make sure they are asking the right question and then rather than giving them the answer I send them into our Experiments channel on Slack or to Stack Overflow.

One of our interns, Michael Pineda, had a really interesting weekend. Michael is a Mechanical Engineering sophomore at the University of Nebraska at Lincoln (UNL). He started with us late in the fall of his senior year in high school. He is a member of the UNL Society of Automotive Engineers (SAE) club. Each year SAE clubs at colleges across the country compete in the Formula SAE. They build a small formula style race car from scratch. The competition includes the presentation of a business case for their car. This weekend they had to get the car in front of alumni for the first public viewing. Michael is a lead on the suspension team. He shared these pictures in Slack with the rest of our team at 5:45 AM Sunday. Here are some pictures of their car:

Suspension

Here is the car:

University of Nebraska Lincoln Formula SAE Racer

Do you think Michael is curious and questioning? He has been at work on a new project that is going to be disclosed about the time we release version 5 (I hope). Michael has been offered a fantastic mechanical engineering internship over the summer. Like all of our interns – he will be leaving us to bigger and better things – but it sure is fun to work with them at this point in their lives.

Between them two of our interns are competing at the national level in seven academic competitions in the next month (Mock Trial, Speech & Debate, Academic Decathlon . . .) Three of our interns are enrolled in 12 total AP courses this semester. The two that are graduating from high school this year have been offered academic scholarships in excess of $250,000 (+/-~). Our newest intern is keeping up with a challenging academic schedule and doing some amazing things in the long-jump and relays for her high school.

I know I am rambling, but I would remiss if I did not mention Shelby Lesseig. Shelby was an intern while she was working on her BBA (Accounting)/MPA at UT Austin. I just checked out her Linked-In profile:

Linked-In Description of Work with AcademicEDGAR

Shelby came on while I was still learning about the characteristics we really needed in interns and she unwittingly helped me better identify some of those qualities – she set an early standard that we still use today. We still use some of Shelby’s early work to manage our data extraction and normalization processes. Shelby’s husband and his colleagues have actually benefited from some of her work. Is that cool or what!

This post was prompted by Michael’s excitement over his contribution to the race car and me coming to terms with the fact that he is going to be moving on soon. It made me think more about this journey and while it has been challenging at times – I really do think the best part of it continues to be getting to work with such bright interns (I was going to say kids but I don’t want to diminish them at all).

DOCTAGS An Overview

I was answering a client question this morning about limiting search results to particular documents and decided that it was probably time to post here about our DOCTAG filtering.

An SEC filing includes a form and might also include exhibits. In conversation and generally in writing about filings we don’t often separate the form from the exhibits even when our work might be focused on the form rather than the filing (inclusive of exhibits). As part of our process we collect filings from the SEC and then parse the filing to separate the form from the exhibits. We then tag the form and the exhibits to allow you to select, search and manipulate search results based on the type of document.

The tag for the form is the name of the filing (10-K, 10-K/A) with all of the spaces, dashes (-) and slashes (/) removed.  so the 10-K becomes 10K and the 10-K/A becomes 10KA – an SC 13D becomes SC13D . . .  The SEC mandates that exhibits follow a convention with respect to the description field when they are included in a filing (see this https://www.law.cornell.edu/cfr/text/17/229.601).  We follow the same rule for converting the Exhibit Description to our DOCTYPE code except we remove everything to the right of any decimal in the EXHIBIT TYPE FIELD.  While filers may have their own internal system that they use to add meaning to the DESCRIPTION field of an exhibit in a filing – that system is not available to us.  So when a filer uses EX-10.17 – our DOCTYPE code is EX10.  Here is an image from an Apple 10-K filing  we coded the 10-K as 10K and the exhibits as EX4, EX10, EX10, EX21 . . .

Document List for Apple 2020 10-K Filing

At one time I speculated that Apple’s convention is to indicate the sequential order of the exhibit type included in an SEC filing in the fiscal year. I no longer believe that to be true. While we have seen cases where a particular filer seems to have a coding scheme (10.1X is a debt contract and 10.2X is a compensation related contract . .) these practices are internal not externally driven.

I think I have addressed this before – the best way to begin identifying particular types of contracts is to use the DOCTYPE filter and specify EX10 and then use the XFIRSTWORD search with the within operator (W/#) and then key words that would be expected to be within some N words of a particular type of contract. For example (DOCTYPE contains(ex10)) and (xfirstword w/10 (debt or credit)) will return all Exhibit 10s that contain either the word debt or credit within 10 words of the first word in the document.

Search is an art, it is important to play around with the span (w/N) and compare the results. I have seen cases where our clients have more than 400 words/phrases to check – this is no problem at all for the parsing engine – you just have to be careful about the grouping of your phrases/terms. When I hear from a client that they can’t get a search to work – invariably it turns out to be a problem with parentheses placement.

As an aside – I really find the Cornell Law link above to be one of the best resources to use when trying to understand what I can expect to find in a filing.

Version 5.0 is coming!

Let me start with the exciting stuff – here is an image of a key feature in the next version:

First Image of New Database Interface

We are building a query tool that will be incorporated into our ExtractionEngine. If you have seen some of our past posts – we have loaded some metadata databases to the cloud. We built the databases initially as temporary containers to hold metadata until we fully settled on the specific data we wanted to add to the filings. Some users asked for access to some of the data and because it is complicated to fully incorporate the data into the filings directly we thought a reasonable intermediate step was to make these databases available for you to download.

The problem is that of course you have to be download the database, find and install a viewer and learn how to work with a viewer . . .

Further, I have been wondering about making more of our data available in a more direct fashion. We also have a secret project to dump another very useful data set into our system that we have made really decent progress with. So I wondered if we could create our own data query/view feature – the image above is a first view of this feature.

Generally, when you select the query tool from the menu the application will check our repository to determine what databases are available (because we are going to add lots of data) and load the available databases into the application. You will select a database and the application will dynamically identify the fields and their characteristics (TYPE) and then offer you a panel to build a query. Once the query has run you will have the option of saving the results to a CSV file.

Most of the other features we will be adding to 5.0 are more incremental. For example, if you want to access new document indexes you have to run the File/Options/Index Library/Generate Library utility. With our cloud deployment these is unnecessary so we are making the application dynamically respond to the available index collection when it starts. Once we implement this you will not receive any more emails from me announcing that a new index has been added, the application will just know what indexes are available.

I don’t have a precise date yet for this new release. We are in the midst of proxy season. This means that a lot of our focus is on making sure the executive and director compensation tables included in filings that day are available to you by the close of business. Our system is really good but there are always new challenges that take some attention and focus. I am not sure if we will be able to finish all of the development work before the end of proxy season or not. If not it should be soon thereafter.

Accessing Metadata from Documents Extracted Through our Platform

Fascinating question. a PhD student is doing some work with some filings they extracted from our platform. They ran some complex searches to identify the documents and saved them to their computer. However, they did not generate a summary of the search results so they lost immediate access to the metadata associated with the documents. They realized after some extensive work that they would like to use the metadata in their analysis. So the question was, can they get the metadata without rerunning the search? The answer is yes, particularly since they are working in Python. Below is some code to create a list of dictionaries of the metadata embedded in the htm files in the source folder. We use the LXML library. I think many of you might be using BeautifulSoup. If so I think there is a small modification needed. The key though is that we add the meta as elements with attributes, so we can pull those elements and get their attributes rather cleanly.

import glob
from lxml import html
import os
meta_list = []
for htm_document in glob.glob(source_dir + os.sep + '*.htm'):
    with open(htm_document,'rb') as fh:
    b_string = fh.read()
    meta_dict = dict()
    tree = html.fromstring(b_string)
    meta_e = tree.xpath('.//meta')
    if len(meta_e) == 0:
    print(f'no meta {htm_document}')
    for m in meta_e:
      attrib = m.attrib
      meta_dict[attrib['name']] = attrib['content']
    meta_dict['source_path'] = htm_document
    meta_list.append(meta_dict)

I pulled out an 8-K filing and ran the above code on an 8-K filed by Apple. Here is the result:

{'DOCTYPE': '8K', 'SECPATH': 'https://www.sec.gov/Archives/edgar/data/320193/000119312521001982/d29637d8k.htm', 'ACCEPTANCE': '20210105163216', 'SICCODE': '3571', 'CNAME': 'Apple Inc.', 'FYEND': '0925', 'ITEM_5.02': 'YES'}

Reminder – Searches with WORDS that are Operators Require ~ Appended to the Operator

A client was trying to sort out how to run a specific search. They wanted to use a phrase with the word and – they were getting anomalous results and so dropped us a question. Anytime you need to use a primary operator in a search you need to append a tilde (~) to the search operator.

I am reluctant to share their specific search. I was looking at some audit proposals for the ratification of the Independent Registered Accounting firm and so that is where this example comes from. Suppose you want to search for cases where the proxy reports that the auditor is not expected to be present at the annual meeting. NOT is a search operator and it is a strong one – it will eliminate documents/results with the word/phrase that follow. (It has more complicated uses but let me avoid a rabbit hole here).

To identify those documents where there is an explicit indication that the auditor is not expected to be present at the meeting I ran the following search:

((not~ expected) w/10 present) w/50 (audit* or account*)
Auditor Not Expected to be Present

If we run the search without the tilde – the result would be those cases where the word expected was not within 10 words of the word present and expected was within 50 words of words rooted on audit or account. Are you confused, sometimes I am to – search is an art.

I will admit, I find these results interesting – there are not that many though and it seems that a good number of the cases are those where the auditor from the prior year is not continuing – but not all. The image above came from Forward Air’s 2021 DEF 14A.

Of course, I immediately wondered if those cases were indication of an intent to dismiss the auditing firm in the near future. Unfortunately when I expanded my search to cover all years I find cases where the auditor is routinely not expected to be present. For example CECO ENVIRONMENTAL, CULLEN FROST BANKERS and DOVER CORP. Shucks, I thought that would be an interesting research paper.

2020 Insider Trading Data Updating With ACCEPTANCE-DATETIME Field

The system is currently running to update all of the 2020 SECTION-16-SUMMARY data with the ACCEPTANCE-DATETIME field. The process started about 1:00 PM on 1/8/2022 – I estimate that it will be complete by 4:00 PM (ish).

I did make a dreadful mistake during this update. I pulled the 2020 offline while I was preparing the code. I received an email and while I was able to address the requirements for that user I realized it was not necessary to pull everything offline. I will not make that mistake again. We are now working on the prior years. Thank you for your patience – this should have been addressed when we first handled these files.

When I posted that the 2021 data was available I noted that my sense was that there were more insider transactions in 2021 than in 2020. This was confirmed as in 2021 there were 875,357 total processed rows in 2021 pulled from 224,474 unique filings. In 2020 there were only 717,239 rows pulled from 203,566 filings. I think we expected this because we saw many more new directors in 2021 than we have seen in some while.

We are still heavily involved in some of the transition stuff and so it may take a while, however, we will generate the DIRECTOR-RELATIONSHIP artifact for 2021 soon. That will be interesting as my sense during the year was that more of the newest directors are female. Once that artifact is created it will be easier to confirm that observation.

I would like to observe that you can map the trading data to the Director/Executive compensation data by the PERSON-CIK value. In those tables we have GENDER. We have AGE and TENURE fields in the Director data.

Context Normalization – Spelling Counts

I had an interesting email from a hard at work PhD student who was using the ContextNormalization feature of our platform to normalize some data. Because they are collecting a piece of data I have not seen used in research before I am going to describe their problem using AUDITOR TENURE data collection. The nature of their problem manifests itself in the same way in almost every Context Normalization case.

As a result of a PCAOB rule change registrants are supposed to disclose their tenure with the client. The most common expression of that tends to be We have served as the Company’s auditor since YYYY. Below is an image from running a search for auditor* since on 2021 10-K filings.

Auditor Since Search

I ran the search auditor* because I also want to catch the expression of auditors since.

I set a really tight span for the context since this is one of those binary cases – it will be concisely expressed or it is not likely to be expressed. Remember – to set the span for Context – use Options/Context and specify the span you need.

Setting Context Span

Once we’ve done that we are ready to set the parameters for the ContextNormalization. Notice I did not include auditor in the Extraction Pattern. This assures me that the processor will not discard those cases where the phrase is auditors since. Since the processor is working on the active search results I have no concern about phrases like we have been making amazing products since XXXX. Our search was for auditor* since.

This is one of the ‘spelling matters’ issues – if I specify auditor since the engine will not normalize auditors since. The word auditor was critical to get the right context but using it in the Extraction Pattern will reduce the yield since there will not be an exact match to auditor since when they have used auditors since.

The second spelling issue occurs because of formatting errors or typos. When I sorted the results by the value of tenure – you can see I had some results that didn’t make sense.

Inconsistent Tenure Values

Someone accidentally inserted an extra space as they were typing in the year values or the underlying html has a tag separating parts of the number.

Then of course we have these cases

More Context Errors

In the cases shown above the search correctly identified the context but there are words intervening between the word since and the value we want for year.

We collected the year value from 6,745 documents based strictly on the existence of a valid number following the word since. There were 115 documents with language “since at least YYYY” or “since fiscal YYYY” and other permutations and there were a total of 4 typos.

I keep trying to play around with a Python library called Fuzzy-Wuzzy to improve this yield and while we can make significant improvements for specific use cases – the problem is that I can’t seem to anticipate all use cases in a way that makes me comfortable implementing the library inside one of our functions. However, if you do a ContextExtraction and have some time on your hands I would encourage you to poke at the normalization with that library.

Detailed 2021 Insider Transaction Data Now Available

We completed processing of the 2021 Form 3/4/5 and their amendments today and started uploading the parsed and normalized data to the PREPROCESSED_EXTRACTION platform at about 10:00 AM (CT) (1/2/2022). It should be fully available by 2:00 PM.

As a reminder the data is available through a request file that contains the fields CIK, YEAR and PF. The CIK is the CIK of the ISSUER, YEAR is the transaction year and the PF value needs to be FY. Because of the density of this data we have a hard limit on 2,500 CIK-YEAR pairs per request file. One of the challenges is that when we are building the CSV file to return to you we don’t know all of the column headings until the request is processed. Believe it or not there are some forms that contain more than 10 footnotes for a number of fields. We have captured all of the footnotes and labeled them based on the cell/data value they relate to. In the example below there are 18 additional columns that report the content of those 18 footnotes. (Link to Original Filing)

Example Footnote Complications

The return data has every transaction reported in all filings made during the calendar year. We include the ACCESSION-NUMBER of the filing to help you identify the source filing. Each separate transaction is assigned one line. We include a field called datatype to indicate whether the row is describing either a derivativetransaction, nonderivativetransaction or a remark. REMARKS are assigned a row because in general a remark relates to the filing rather than any one specific transaction that is reported in the filing. This is clearly not the case in all instances but we cannot discern those remarks that relate to one specific transaction or those that are general to the filing.

I wanted to run some stats to compare this year’s activity to last but time has interfered. My gut tells me that this was a busier year for insiders than recent years – but don’t take that to the bank. 6,620 issuers reported trading data with 224,474 filings. We have a total of 875,357 transactions/remarks.

If you want to review the data I have included two attachments. One is a list of all of the column headings (COLUMN_HEADINGS) and the second is a CSV file with all of APPLE’s transactions (320193_SUMMARY).

I go back and forth on whether or not we should index (make them full text searchable) these filings. It is not that much of a challenge – but I am just not understanding the utility. Comments are always appreciated. I will observe that my guess is that you might need/want to identify those filers with 10B-5 transactions. I suspect though that once you did that the next step is to have simple access to the details of the transactions – so the search step is just to identify the filer. To that end I have run code to identify all issuers where there are one or more mentions of 10B-5 in the summary for 2021. You can access that list from here (10B5).

I will be pulling the 2020 insider data offline (and then each prior year one at a time) so we can add ACCEPTANCE-DATETIME to each transaction. This was not even a thought when we first worked with this data. It was my mistake and the only way to fix it is to reprocess each of the years. My apologies.

Updated 10 Hours after initial posting. I had a question about footnote mapping. If there is only one footnote attached to a data value in a cell then the footnote is placed in a column with the name datavalue.footnote. There is no number assigned. If a cell has two footnotes, then each subsequent footnote in that cell is assigned a number beginning with 1. So the second footnote is assigned datavalue.footnote_1. We do not assign the numeric value the filer has assigned because it would explode the number of columns since we need to key the footnote to the data value. For example, if you look at this Form 4 (Elon Musk Form 4). There are 29 non-derivative security transaction that each have a unique footnote attached to the data value for the transaction price. However, since there is only one footnote for each data value – each of those footnotes is in the same column of the results. There are two footnotes in the cell for exercise date value for the one non-derivative transaction, those footnotes are in columns with the labels exerciseprice.footnote and then exerciseprice.footnote_1.

If we used the number key of the footnote as part of the footnote name then we would have significantly more columns and in my opinion the meaning would be less clear.

10-K Header Details with Enhanced Coverage of ACCEPTANCE-DATETIME Now Available on Platform

Our previous work to establish a proxy for the timestamp for 8-K filings prompted one of our clients to ask whether it was possible to do the same for 10-K filings. We used the same strategy and were able to add a good proxy for this field to about 97,000 10-K filings where it was not available in the header. With this addition we have 318,304 10-Ks with an ACCEPTANCE-DATETIME value and 7,610 missing a value. While working on this it seemed reasonable to collect as much metadata from the filings as possible. For example, I saw a recent paper that reported that most of the other databases we use for research have only the most recent address. They described having to go to EDGAR to collect this field. There does not seem like a lot of value for you to have to jump through that hoop. Therefore we added most of the fields included in the header or on the INDEX page of the filings.

There is a new compressed folder in the EXTRAS (see here how to access) folder that has a copy of the sqlite3 database (and some code etc). In addition to the information that was reported in the header files we added the DE_PATH – this is not the path to the 10-K but the path to the folder where the 10-K is stored. This will allow you to match search results from directEDGAR to a query from the database with just a little Excel magic on the FILENAME column from a Summary or Context extraction from directEDGAR.

I really thought this was going to take two days once I had sorted out this goal. We had already pulled the ACCEPTANCE-DATETIME measure for all of the filings. We have the headers archived. I thought it was going to be a trivial mapping exercise. It was much more complicated because there are some variations in the fields included in the headers (I learned that there is a field called CONFIRMING-COPY) and then there was a special challenge with the addresses.

As I was looking at the data I was questioning the value of using the codes that the SEC requires filers to use for the STATE field in the BUSINESS-ADDRESS and the MAILING-ADDRESS sections of the header. Who wants to pull this data and then go find the meaning of the value M2 (Jordan) to organize the results for your model? So I decided that we needed to map the values the filers use when preparing their filings to their descriptive values. This was interesting. The only source I could find for the standardization of these fields from the SEC was limited. First, Lexis-Nexis managed the dissemination platform early and I found a page that listed the codes that were to be used (Archived Country Codes). Notice that the list of codes is incomplete – we had to find cases where the code was not listed on the page but included in the header, read the header and assign the code value based on the information we could glean from the filing (and of course Professor Google). The SEC made some changes so that beginning in February 2010 these code were to be used (New State/Country Codes). There is no specific date for the changeover and no easy way to tell so we mapped to 2/1/2010 to the old and subsequent filings were mapped to the new. While there is consistency for the US states the codes for most other jurisdictions changed. For example, West Germany was I8. Under the new regime Germany was assigned the value 2M.

While my primary reason to assemble this is to facilitate your research, a second and still important reason is to make stuff available that might be interesting to muck around with in class or to use for student projects. Therefore I thought it was important to do the STATE field transformation. Note, I did this transformation to both the BA_STATE (Business Address) and the MA_STATE (Mailing Address) fields.

I want to close this by observing we simply organized the data as reported. There is operator (filer) error. I was showing Manish the finished product and thought it would be fun to identify all filers from FINLAND. There were only three observations. However, one of those is an error. China Natural Resources (2000 10-K Link) entered H9 in the STATE field. In prior filings they used K3 (Hong Kong). This of-course brings up the question – Should we correct these? Maybe later, that would be a significantly different project.

I included an html file that has the country codes (both old and new) in the folder. The queryDB.py file includes the list of fields and there are two csv files. One is the input file used in the query that restricts the results to a specific list of CIKs and the other is the output from that query. I have not yet had a chance to play with Pandas so my code still reads the input and writes the output using the CSV module. I thought about modifying Antonis’s code but since I can’t test it – I will leave it to those of you who are used to the Pandas library.

As a final note the data is current as of 12/22/2021. We will add this to our monthly or quarterly update flow once we get through the transition to 2022 in our search platform. I considered waiting until after 12/31 but there are too many tasks that need to be addressed with the beginning of a new calendar year. Thanks for our patience and I hope you find this useful.