Operator Search (and, or & not)

I was working on the finishing touches for the process of uploading the initial CAM_FEE data – which should be live in the next 10-12 hours. I was specifically agonizing on the value we should use for the AUFEEYEAR field. There is such inconsistency in how SEC registrants refer to their fiscal years and thus how it the data is reported in their filings.

As part of my research I decided to run a search for the phrase ‘FISCAL 2020 and 2019’. I had forgotten that when we search using words that are also search operators (and/or/not) it is necessary to append a tilde (~) to the operator that we want to use as a search term. Without the tilde the term is treated as an operator. The original search phrase returns all documents with the phrase FISCAL 2020 and the number 2019 somewhere else in the document. However, with the tilde (FISCAL 2020 and~ 2019 my search returns exactly what I was searching for (all documents with the phrase ‘FISCAL 2020 and 2019’

To illustrate the issue I am concerned about – here is an image from the 10-K/A filed by TECH DATA CORP on 5/21 (their FYE is 1/31).

Williams Sonoma’s FYE is 1/29. However, in their audit fee disclosure they describe the fee payment for their most recent fiscal year end as payment for FISCAL 2019.

Since we need to link the audit fee disclosure and the audit opinion and since the audit opinion is included in a 10-K filing that has the balance sheet date as the last date of the fiscal year – the AUFEEYEAR value needs to be the YEAR from the balance sheet date. I hope that is not confusing but that gives us the most robust way to track the data between the two filings DEF 14A and the 10-K. If we didn’t do this then we could have some 1/31 filers with the AUFEEYEAR value reported as 2019 (see William Sonoma) and others with the value 2020 (TECH DATA CORP).

Update on Critical Audit Matter Disclosures and Audit Fee Data

Well I apologize for our delay with the update to the CAM data processing. The automation process that extracts the audit report from the 10-K and then posts it to our download server is working fine. The struggle has been to determine the best way to organize the data extracted from the audit report and how to distribute this data. We are undergoing final testing today and Tuesday and I expect to make this new data live sometime late Tuesday (5/26/2020) – as an aside I will mention that Tuesday is the favorite (and only) son of doting elderly parents’ birthday.

Before I ask you to dig into the nitty-gritty on how about to access and work with this data – use this link to access a small sample (CAM_FEE_SAMPLE). There were some unexpected issues (discussed next) that are important to understand as you begin to use this data.

The first unexpected issue was what to do about those cases when the registrant files an amendment to the original 10-K. This actually happened much more frequently than expected. We have compared the original audit report and the subsequent audit reports and thus far there have been no changes to the data that we have extracted from the audit report except for the auditor signature date. Therefore – our present strategy is to include the original audit report date as one of the fields. If there is a change in the conclusion of the audit report or a change in the critical audit matter disclosures we might have to add a second row of data to report the original as well as the new values.

The next unexpected issue was how to handle the cases when the registrant files audit fees in multiple reports and the audit fees do not match. The most common scenario for this is when the registrant initially includes the audit fees in ITEM 14 in a 10-K or 10-K/A and then reports new values (for the same period) in a subsequent 10-K/A or DEF-14A. We have decided to operate on the presumption that the most recent filing has the best measure of audit fees. I will observe that we have investigated every case where there were differences reported. There was only one case where the registrant reported that the amendment was filed to correct the audit fee disclosure.

The third major issue relates to those cases where the sum of the audit fees do not equal the total amount of fees reported. We handled these cases in two ways. If we could identify a reasonable explanation for the discrepancy – perhaps the total was off by the amount of one of the components we corrected the total. If we could not identify the reason for the difference we left all values as reported and added a field to the data table named ERROR. This value represents the difference between the sum of the components and the reported total. The following table contains the full list of data values that will be delivered if this data is requested from our system.

CIKIssuer Central Index Key.
RDATEThis is the balance sheet for the 10-K filing as reported by the registrant. The year component of this date is the value to use in the request file for YEAR. For a 12/31/2019 FYE the RDATE will be R20191231.
CDATEThe CONFORM date for the filing that reported the audit fees.
FNAMELast two digits of the accession number of the filing that reported the audit fees.
TIDThe index of the audit fee table relative to all tables in the source document.  If the audit fee table was an image file this value is the image index.
AUFEEYEARThis is the fiscal year that the registrant reports in the audit fee table.
AUDIT FEESAudit fees as reported by the registrant.
AUDIT RELATED FEESThe total amount of audit related fees as reported.
ALL OTHER FEESThe total of all other fees.
TAX FEESThe total of the various tax fees as reported.
TOTAL FEESThe reported total of fees as reported in the source document.
ERRORThe unexplained difference between the reported total and the sum of the components.
10K_RDATEThe dissemination date of the 10-K filing that included the audit report on Critical Audit Matters.
10K_CDATEThe CONFORM (balance sheet date) as reported with the 10-K filing.
AUDITORThe name of the audit firm that signed the audit report.
LOCATION_CITYThe CITY location as reported by the auditor.
LOCATION_STATEThe STATE (or COUNTRY) as reported by the auditor.
AUREPORT_DATEThe date of the signature of the audit report.
SINCEThis value represents the year the auditor began working for the registrant.
REPORT_TYPEFINANCIAL or COMBINED to indicate whether or not the audit report includes the auditor’s opinion only on the financial statements or also includes their report on internal control.
OPINIONWhether or not the financial statements PRESENT FAIRLY the results of operations.
EXCEPTIf there is an exception reported in the audit report this field will report the language used by the auditor to describe the exception.
CAM_COUNTThe total number of Critical Audit Matters that were included in the opinion
CAM_1 – CAM_NThe auditor’s heading/description of the critical audit matters – these are reported in the order they were found in the audit opinion

The question now is how do you access this data after Patrick’s Birthday? Once we make the data live their will be a new field available in the Pull section of the ExtractionPreprocessed user interface. You will know that we have made the data live because you will see CAM_FEE_TABLES in the Data Tables Block.

Let me make sure that the parsed data in this form will only be available after the registrant has filed audit fee data. Until the audit fee data is available you can still access the actual audit reports that have been parsed from the 10-K filings. This form of presentation is meant to be entirely separate from the audit report access we turned on late last year. The year value to use in a request file for these reports should be the expected year of dissemination (filing) on EDGAR. So for a 12/31 firm the audit report for 2019 will normally be available in 2020.

Note that we included the original 10-K dissemination date 10K_RDATE as a field so you can run an event study with this data. We also included the fields 10K_RDATE and 10K_CDATE so you can match the data values to the actual audit report if you choose to pull the audit report. These fields also allow you to very quickly create a request file for the audit report. Use the CIK, the year parsed from the 10K_RDATE field from this new table and add the column PF with the value FY.

Of course the critical question is still Large Accelerated Filers. After communicating with some of our users I am convinced we need to make sure this valuable data point is available – I actually think we need to provide a new file type that lists the actual filer type (LARGE ACCELERATED, ACCELERATED . . .) accessible by CIK and YEAR. That has been added to the work list. In the meantime send me an email and I will let you have the most updated list of LAF that have filed since 6/30/2019.

This post is too long – I will wrap it up to note that when you request this new data type – the source file that is delivered is the original htm representation of the audit fee data. We normalized the audit fee disclosures – I know some of you want the more detailed fee components when they are disclosed. Those are easy to access by using the DEHYDRATOR-REHYDRATOR process on the source files.

Novel Coronavirus Disclosures

I hope all of you are well.  We’ve been working on staying safe and keeping our systems updated and working.  I actually was hoping to have a new post by now giving you a summary of our Critical Audit Matter data findings.  However, with a shift to remote learning at the University of Nebraska at Omaha, remote learning for my son who is a sophomore in high school (and the only child of doting elderly parents) and then the absolutely unprecedented increase in early proxy filings I’ve not had time to get back to that post.  However – the audit reports are still flowing and if you would like a summary of the reports please let me know.  As soon as things settle down a bit we will finish the transformation of our audit fee data and include the CAM details as additional fields in the audit fee data.

However, I was talking to one of the team today who suggested that I do a quick check to see what had been disclosed about the Coronavirus in the filings that have been made since the first of the year.  So I did do a quick search of just 10-K and DEF 14A filings.  The DEF 14A disclosures mostly report on the prospect of switching to a virtual annual meeting.  As you can imagine the 10-K disclosures vary significantly in the level of detail and salience.  Here is a link to a summary extraction of the disclosures (frequency and company name as well as CIK (COVID.csv).

I was really interested in the nature of the disclosure made by Carnival Corp given the challenges they were facing with illnesses on their ships.  They filed their 10-K on 1/28/2020 (this is very consistent with their historic filing date for their 10-K).  Here is the extent of their disclosure:

Fiscal Year 2020 Coronavirus Risk
In response to the ongoing coronavirus outbreak, China has implemented travel restrictions. As a result, we have suspended cruise operations from Chinese ports between January 25th and February 4th, canceling nine cruises. We also expect that travel restrictions will result in cancellations from Chinese fly-cruise guests booked on cruises embarking in ports outside China. We estimate that this will impact our financial performance by $0.03 to $0.04 per share. If the travel restrictions in China continue until the end of February, we estimate that this will further impact our financial performance by an additional $0.05 to $0.06 per share. Five percent of our capacity was scheduled to be deployed in China in fiscal year 2020. If these travel restrictions continue for an extended period of time, they could have a material impact on our financial performance.

Here is the disclosure from American Airlines (2/15/2020)

  In particular, an outbreak of a contagious disease such as the Ebola virus, Middle East Respiratory Syndrome, Severe Acute Respiratory Syndrome, H1N1 influenza virus, avian flu, Zika virus, coronavirus or any other similar illness, if it were to become associated with air travel or persist for an extended period, could materially affect the airline industry and us by reducing revenues and adversely impacting our operations and passengers’ travel behavior. For example, the coronavirus outbreak that originated in or around Wuhan, China in January 2020 has resulted in the widespread suspension of commercial air service to the region, including by American, as well as the imposition by the U.S. and other governments of significant restrictions on inbound travel from this region. Our suspension of service, which remains in place as of the date of this report, and the potential for a period of significantly reduced demand for travel has and will likely continue to result in significant lost revenue. As a result of these or other conditions beyond our control, our results of operations could be volatile and subject to rapid and unexpected change. In addition, due to generally weaker demand for air travel during the winter, our revenues in the first and fourth quarters of the year could be weaker than revenues in the second and third quarters of the year.


These are interesting given the significant impact these companies have experienced as a result of this outbreak.  I’m not intending to evaluate the quality of the disclosure – that remains for an empirical study – I am just surprised at the brevity of the disclosure.

Since I had the disclosure counts, I was curious about the frequency of disclosures across time.  Below is a graph of the average count of the frequency of COVID or CORONAVIRUS mentions in 10-K filings by date.  The interesting feature of this pattern is that it at least suggests that this was not a significant concern to the management of this sample of firms until after March 8.  So the first disclosure was made by Carnival on 1/28 with two instances of either word.  While the average disclosure frequency doubled by 3/2 – this does not seem like a significant uptick relative to what happened soon after.  As an aside – the firms who disclosed before 3/3 were mostly Large Accelerated filers (the filing deadline for 12/31 FYE LAF was 3/2 this year).  So the uptick in disclosure counts reflects more mentions in the 10-Ks of relatively smaller firms


I also mentioned in the first paragraph that we have seen an uptick in early proxy and 10-K/A filing ( registrants will sometimes disclose information required by Item 10 – Item 14 in an amended 10-K if they are uncertain if they will be able to file their proxy in time to meet the 90 day deadline after they have filed their 10-K.  In total our executive/director and audit fee extractions are running about 12% higher then this same time last year.  Most of the increase seems to be in 10-K/A filings rather than a real increase in proxy filings.


Brief Update and Data for your Ticker Display

The first release of our 4.6+ series of the application is code complete but we are still working on the documentation.  If you in-fact would like to have a pre-documentation release of this version send me an email and I will be happy to arrange it.  Documentation completion is still a couple of weeks away as we are juggling some other production issues right now.  Check the previous two posts for the most significant enhancements to this release.  Frankly I am quite excited about the ability to limit searches to a DATE-CIK pair.  This was trickier than we had hoped as the memory object that we had to create to manage the processing was something we had never worked with before.

At the University of Nebraska at Omaha we have a fairly grand entrance with a ticker display set up across from a student run coffee/snack shop (Stedman’s Cafe).  I was in a conversation with our IT Director about a year ago and we wondered together what it would be like to display compensation data in real-time.  This was a low priority  project so it took awhile but we finally went live with it in November.  When our servers process executive compensation data from a company in a list we maintain of the 1,500 largest companies by market cap we push the data to a url where the ticker software that powers our tape can grab it for display.  Our display and the software is from Rise – I suspect most of you use the same system we use as they seem to have most of the market.

Here are a couple of shots of some compensation data as it is streaming across the display



When we decided to do this – I think I was intrigued by the challenge more than anything else.  However, I was sitting in the cafe the other day and it was interesting to hear students talk around me about the compensation.  While I doubt students talk about this every day – the brief time I was in there it was the subject of conversation with two student groups sitting relatively close.  Some seemed awed that people could make so much.  They were also speculating on the nature of the business and the specific nature of the jobs based on the titles.

We only display the Company Name, PERSON-NAME, PERSON-TITLE, SALARY and TOTAL for the current year.  However there are more fields available and I think after this proxy season is over we will play around with doing some changes in compensation and the like.

Here is an image of the JSON form of the data that gets pushed out for access:


Cabot Corp is chemicals and materials company.  According to their 10-K “(Our) principal products are rubber and specialty grade carbon blacks, specialty compounds, fumed metal oxides, activated carbons, inkjet colorants, and aerogel.”  (Yes I got lost after rubber).  They filed their proxy at 5:24 on Friday 1/24/2020 and this data was then available by 5:50 (not that speed is hyper critical here).

Enough rambling – we have done all of the hard work including working with Rise to sort out how to package the data for display.  If you would like to display this data on your ticker – let me know and we will add it to your license at NO ADDITIONAL CHARGE.  It is interesting how expensive some of the data is that is supplied for these terminals.  You can actually display any of the fields in the JSON above and we can also add fields (cash total versus non-cash total . . .).

I really was surprised that this was registering with students and so I am delighted to make this available to all of our clients.



Teaser-2 Version 4.6 Event Study Like Filtering

Our date filtering has taken a giant step forward.  Previously you could filter a search by dates but you had to apply the same date filtering to the entire set of CIKs that you were using in your search.  Version 4.6 adds the capability to set a discrete date filtering window around a particular CIK-DATE pair.

You need to supply a CSV file with a CIK column and a MATCHDATE column.  The values in the CIK column need to be the CIK values for your sample – no left padding of zeros – just the integer form of the CIK.  The values in the MATCHDATE column need to be dates in the form of MM/DD/YYYY or M/D/YYYY (for our international users if the standard date form in your locale is D/M/YYYY – our application will expect that form – whatever date form your version of Windows defines as ‘normal’),

Here is an image of a valid input file – notice there can be additional data in the file – the columns do not need to be adjacent but they need to be clean in the sense that the CIK and MATCHDATE column headings need to match exactly (no spaces, upper case etc).  You can have multiple CIK-DATE pairs – in the image below I have two different MATCHDATE values for CIK 1800


Once you have a CIK-DATE file created – start the application check the Use CIK button in the CIK/Date filter section and then select the Set CIK/DATE File button below the search box.


The user controls to manage the selection of the source file will become active.


Use the Browse button to navigate to and select the file to use for input.  Notice in the Range Days area of the control you can specify the number of days before the date separately from the number of days after the MATCHDATE.  Thus you can have a lop-sided window (0 to +180) or a symmetrical window (-45 to +45).  You also have the option to match to the RDATE or the CDATE.  Once you have selected the options the Okay button becomes active, select it to update the application with your input.

Once you have fully defined your search you can hit the Search button.  The application will return only those documents that matched your search criteria, matched by CIK and were filed within the date range you specified for the CIK-MATCHDATE pairs you specified in the input file.


I want to observe that the search time represents filtering through a base document collection of over one million documents to filter down to the 7,192 that matched my criteria.  Because this is research we understand how critical it is to be able to identify those CIK-DATE pairs that did not match any filings.  There is a View Misses button on the application that when selected will provide a list of those missing pairs.


Notice the Save as CSV button – selecting that will give you the chance to save these results to a file for manipulation, review or re-submission with a different span.  The CSV file will have the CIK and MATCHDATE columns.  The file will not contain any data values from your original submission file.






Teaser – 1 Version 4.6

We are finishing testing and working on the documentation for the next version of the ExtractionEngine.  We have added some new features that we hope will help with your research.  I will share these as we finish the testing and the documentation for each of the features.

The first item we have completed work on is that we added the ability for you to extract the text only from your search results.  While there is an increased interest in using text mining tools and performing sentiment analysis one blocker has been that most of the text mining software assumes/requires that the input text be in the form of plain text content.  Since most SEC filings are in html our clients have observed that it is painful to extract the text from a set of documents.

We added a new item to the Extraction Menu DocumentExtraction TextOnly.


If you select that option a new UI will start and give you the option to specify the destination directory where you would like the text form of the documents to be saved.  Once you have specified the destination directory and hit the Okay button text only copies of the original documents that were returned from the search will be placed in the specified directory.  Each document will be named in the CIK-RDATE-CDATE-~ form so you have an audit trail back to the source document.  Here is an image of a directory with a collection of filings that have been converted to text.



Here is a partial image of an 8-K filing in html format:


Here is an image of the same document after conversion to text:


Since we presume that these txt files will be used as inputs to some additional processing system we have not added line breaks.  The only line breaks in the file are those indicated by the html code so they are a bit painful to read.  However, all of the text is present.



Update to Audit Opinion Work

We have now completed the end-end linkage to make the audit opinions available for download automatically.  Specifically we have connected all of the code pieces that were required to identify the 10-K filings made by Large Accelerated Filers, extract the audit opinion from the filing and then push it to the server that handles your requests when submitted.  Because of the nature of the resources we are using these reports are currently available around 3:00 AM the morning after they are filed. That is if any are filed on Monday – the audit reports should be available to you at around 3:00 AM on Tuesday.

We are also working on providing a summary of the Critical Audit Matters described in the opinions as well as additional meta-data about the opinion.  If you would like to review the current status of this summary detail please follow this link (CAM_SUMMARY).  Below is an image of the current data columns in this file (columns A-E have meta-data that can be used to tie these details back to the actual 10-K filing where the opinion was found


We are specifically identifying the nature of the type of audit opinion (REPORT_TYPE) where the CAM are described (FINANCIAL or COMBINED where COMBINED includes both the opinion on the financial statements as well as the opinion on internal control).  We are also indicating the nature of the opinion and if an exception is noted then it will be reported in the EXCEPTION column.  CAM_COUNT is an integer that indicates the number of CAM listed in the opinion.  And then finally we are currently listing the title the auditor uses to describe the CAM.

Here are some fun facts (only accounting researchers/faculty would call these fun facts) from our initial analysis:

Distribution of CAM counts by registrant:


Five audit reports made mention of 4 CAMs, and 20 listed 3.  Here are the five registrants whose audit reports listed four CAM:


The most common CAM address measurement and impairment issues related to intangible assets (Goodwill generally but also the allocation across various types of intangible).  Both tax and revenue issues are also very prominent in the collection.  I think what surprised me the most is that there are mentions of inventory valuation issues in eight of the opinions.  I am not sure why I am surprised by this as our profession has roots in several frauds involving massive inventory misstatement.

We are isolating the CAM with the plan to create a separate file for each CAM mentioned in an opinion.  We have some ideas as to why this separation will be useful – but more on that later.  We are also developing a standardized taxonomy with the intention of delivering both the original language of the CAMs as described by the auditor and a more parsimonious description that would make using these as inputs to empirical models a bit simpler.  To make this a bit clearer – here are four descriptions of Goodwill Impairment Assessments

Goodwill Impairment Assessment – Cortland U.S. Reporting Unit
Goodwill Impairment Assessment
Goodwill Impairment Assessment – Adtalem Brazil Reporting Unit
Goodwill Impairment Assessment – Company-Owned Reporting Unit

A scan of these suggests these are all similar in nature and could be standardized to allow better sorting and coding.