Working with the CAM Audit Reports

A few days ago I posted that we had accomplished our goal of making the audit reports available for large accelerated filers on a timely basis.  I promised in that post to describe how to use those audit reports with our application.  In this post I describe those steps.

First you have to access the audit reports.  This is accomplished by submitting a request file with the CIK, YEAR and PF of the source document (in this case the 10-K).  Since the requirement to disclose CAM only applies to large accelerated filers with FYE ending after 6/30/2019 it could be tricky.  Here is a link to the latest model request file which lists the CIKs of all registrants who meet the criteria and who have released a 10-K since the implementation date (CAM_REQUEST_LINK).  For more direct instruction on how to download the actual audit report please review this blog post.  For this post I will begin assuming you have already downloaded the audit reports.  If you need those instructions – visit this post.

When you have finished downloading the audit reports they will be stored in the directory you specified in the ExtractionPreprocessed user interface.  Each audit report is an independent document and is named CIK-RDATE-CDATE-F##-22.htm. Where the CIK is the Central Index Key of the issuer.  The RDATE is the date the 10-K filing was made available on EDGAR. The CDATE is the balance sheet date.  The two digits following F are the last two digits of the original filing accession number and the 22 represents our internal text artifact number for the audit reports.


If you want to review these documents individually – select the SmartBrowser feature on our software and navigate to select the directory that has these artifacts.  The SmartBrowser will load the list of files and provide a list of the CIKs that are present in the left panel.  You can select any individual CIK to review that document or begin at that particular point to move forward.


However, if you would rather use the full features of our platform with these opinions you need to index them.  To prepare for indexing you need to create a new directory on your computer where the audit reports can be saved and the index can be created.  In this example I am creating a new directory in F:\myTemp\DEMO_CAM_INDEX.  Once the directory has been created select Create Index from the Indexing item on the menu bar


When the Create Index panel loads simply select the directory that has the original audit reports as the Source Files Directory to Index.  Select the directory you created for the destination as the Destination Directory and then select the Create Index button.


The indexing process will begin – there will be some messaging as the indexing progresses – including the file that is being processed and at the steps when the index has to pause to save the partial indexes.  When complete the application will report Indexing Complete.


When you hit the Close button the application focus will switch back to the main component.  However the active Index Library will switch from the library you were using to the directEDGAR_Custom library.  Your newly created index will be the last index in the list of custom indexes.


Select the index and start searching – you can now use all of the features of our application that are applicable.  For instance – suppose we want to identify all audit reports that mention revenue recognition:  a great initial search would be “revenue recognition”


As you notice in the screenshot – we have not injected any of our standard meta-data into the audit reports yet – that is why the company name is not yet displayed in the search panel.  We will make the code changes so this happens automatically in the next two weeks.  When that is done we will re-do these initial audit reports so the name is visible in the search panel and is reported in the Summary or Context extractions you might run.

To give you a quick example of using our platforms broader feature set with these artifacts I will quickly walk you through the process of extracting and normalizing the auditor tenure.  I know that tenure is generally reported with language similar to We have served as the company’s auditor since YYYY.  To find that language I am just searching for auditor since.  However, before I do that I am going to adjust the span of the ContextExtraction to five words – I want to minimize the noise in the output.  From the File menu select Options. . .


From the Options panel – Context is the first item.  Select Item and then replace the current value with the number 5 and make sure the Words radio button is selected.


Once you have adjusted the span – hit the OK button and then from the Normalization menu item select ContextNormalization.  There are three parameters to specify.  First, since we are working with the search results displayed in the application select the radio button next to Current Results.  You also need to specify an output directory – two files will be created and saved in the directory that you specify/select.  Finally you need to describe the nature of the normalization.


In this case we need the number that follows the string auditor since and we want to save it in a column with the heading tenure in the results.  When you have specified the parameters hit the Okay button.  When the application closes the ContextNormalization panel there will be two new files in the folder that you specified.


The FileToProcess.csv has the context from the search – the file with the date-timestamp appended has the results after the context has been normalized.


As you can see in the image above – the auditor tenure has been normalized from the context and is reported in the column cleverly labeled tenure.

We are in the midst of revamping our audit fee data and will be adding tenure as a field to that data (as well as the auditor name).  We will also work on linking the fee data with the audit reports so stay tuned for more updates.







Audit Reports with Critical Audit Matters Now Available to Download

As many of you know the PCAOB mandated that the auditors of large accelerated filers with fiscal years ending after 6/30/2019 include a description and other facts about Critical Audit Matters in the audit report.  These started becoming available in July.  We began overhauling some parts of our infrastructure to parse out these reports and make them available for direct download from our platform.  The initial work has been completed and these are now available.

Here is a screenshot of the Critical Audit Matters section of Apple’s audit report from their 10-K filed last week.  If you are not familiar with our application the audit report is displayed in our SmartBrowser – which allows our users to to review htm and txt documents with intuitive features to advance through a collection of documents.


These are interesting reading and I suspect there are some great opportunities for research with these reports.  I will note that I have already used some of the discussions from some earlier ones to help me make some points with my Intermediate Accounting classes.  There is something really salient for students when they read about the challenges of auditing revenue for a company that has multiple performance obligations and has to recognize revenue over time.

You should be able to see these reports listed in the artifact list when you create a request file – they should show up as the last item in the list of Text Sections list of artifacts.


If you have a properly organized request file, select AUDIT_REPORT from the ExtractionPreprocessed menu and hit the Okay button our server will process your request and deliver these snips to your desktop.

Of course the immediate problem is identifying those companies that are Large Accelerated Filers who have a FYE after 6/30/2019.  To make this step a little easier we will maintain a list of filers who meet that criteria and make it available to you as we update the archive.  A current list that is organized as a request file is available here.

Thus, if you save the request file you can then use it with the ExtractionPreprocessed feature to download these audit reports.  Right now we are still running the code on a batched basis.  In the near future we will automate this so these reports are available within 15 or so minutes after the source document (usually a 10-K) has been filed.

In the very near future I will make a new post that describes how you can use the built-in indexing engine of our application to build indexes of these documents so you can search them for relevant content.  Here is an image of one of my tests when I was searching for revenue recognition as a critical audit matter


Again – I will provide more information later – but this feature is now live



Amazing Research Deserves Amazing Rewards

A while ago I was searching for a way to acknowledge researchers who cite directEDGAR as part of their data collection efforts.  I wanted to do something because I think that those citations in the academic journals are a key factor is establishing our legitimacy with the academic market.  So we started a program (that I at least think is neat) where we order up ice-cream from a local ice-cream shop and have the cartons customized with the name of the paper and the names of any of the authors who are our clients.  – Check out the images below.


The image above is the ice-cream delivered to Professor Lubomir P. Litov, he is a member of the Finance Department at the University of Oklahoma.  His (co-authored) paper Lead Independent Directors: Good governance or window dressing? was accepted by the Journal of Accounting Literature for their December 2019 issue.

Here is an image of the ice-cream we sent to Professor Matt Ege, Jennifer Glenn and Professor John Robinson, faculty and PhD student in the Department of Accounting at Texas A&M University.  Their paper Unexpected SEC Resource Constraints and Comment Letter Quality was accepted by Contemporary Accounting Research (CAR) in May – though there has not been a publication date reported yet on the CAR website.


One more, one of my colleagues at the University of Nebraska at Omaha, Professor Erin Bass had an ice-cream surprise recently.  She cited directEDGAR in her paper Top Management Team Diversity, Equality, and Innovation: A Multilevel Investigation of the Health Care Industry recently accepted for publication by the Journal of Leadership & Organizational Studies.

While this may seem a little strange.  Who would make ice-cream with an academic research title on the label?  I will observe that one of our authors reported a couple of years ago that it really made their kids want to read the paper with the title.  They (the kids) were bragging to all of their friends that their dad had ice-cream named after him!!  What could be better than that?

A little back story on eCreamery, the amazing company that handles all of the details.  This company was started by two women in Omaha in 2007.  They have been on SharkTank and Mr Buffett (that’s Warren, not Jimmy) has been known to pop in with some of his best buds.  eCreamery is only about 15 blocks from my house and when we go there we can count on the line being out the door.  Clearly – the best SEC filing search engine customers deserve only the best ice-cream.

So for the fine print.  The decision as to whether or not we send ice-cream as the result of a citation is strictly at our discretion.  There are times we do not send ice-cream – none of the authors is employed by one of our clients is a big one.  Another is that the authors are at one of our international schools or in Alaska or Hawaii.  (eCreamery will not guarantee a frozen delivery outside the continental US – go figure).  This program can be stopped at any time.

10-K History – Data Filtering

Whenever I visit clients or respond to emails about data collection I always try to make the point that it is super critical to identify the sample based on strict criteria to minimize the inevitable chase at the end for missing data and to minimize the processing of the inevitable edge cases.  No matter how structured the disclosure requirements are set out in SEC regulations or the Accounting Standards Codification it is inevitable that some proportion of the SEC filers will get ‘creative’ in their form of the disclosure.  When they get creative – data collection becomes much more tedious as it becomes necessary to identify the structure of their disclosure before we can sort out how to capture the data.  If we can precisely  identify the sample firms before we turn to collecting the data items then we will reduce the effort we spend chasing the odd form of the diclosure.

I am helping a client understand how to use our platform to collect a data item that is disclosed only in the 10-K – it is not a required disclosure in any other filing and it is something that is not likely to be disclosed in any other filing (I did some tests and could not find this item disclosed in the combined  millions of  filings that are searchable with directEDGAR).  So related to this I do encourage our users to review the regulations (either the SEC disclosure requirement as set out in the Code of Federal Regulations or the Accounting Standards Codification).

So our client is trying to collect a particular data item – their sample was derived from some other financial data source.  It may seem a normal presumption that if a company has data available from some other financial data source then there should be a 10-K with this disclosure.

In this particular case there are three problems with the sample from our client.  The first is that some of the sample firms have public data because they have public debt.  So while they file a 10-K they might not have some data items included in the 10-K because the disclosure requirements differ by the nature of the laws that establish their filing obligations (ABS issuers versus public debt only versus common stock).  So while these companies file a 10-K they will not have the particular disclosure our client is trying to collect.  The second problem is that the sample firms may have not had a filing obligation at the time they showed up in the sample.  The third problem is that some of their sample are foreign registrants whose filing obligations differ substantially – they have the option to file 20-F and 6-Ks rather than  the expected 10-K/Q and 8-Ks (as well as a myriad of other filing differences).

The most common way to determine if a company has publicly traded equity is to look for evidence in one of the other data sources that would normally be used to source some of the data for research.  I suggest that as there is not an easy way using SEC filings to determine if a company has publicly traded common stock.  In other words there is not really an easy way using directEDGAR to establish whether a filer has publicly traded common stock.  For example, I played around with some searches to identify those 10-K filers that are privately held and struggled – because this is not a mandated disclosure.  One search I tried was to search all 10-K filings for the existence within the first 800 word of the beginning of the document  registrant or issuer or company within 10 words of the phrase privately held


Some of the results (LEVI STRAUSS and CINEMARK USA) were exactly what I was looking for – those registrants are (were) privately held.  However, many of the results were not what I was looking for.  Therefore if I needed to collect data from companies that had public equity – the best way to define the sample would be to use another tool to determine if they do have public equity.

The second and third issues that needed to be addresses is whether or not the company filed 10-Ks (since that is filing that contains the data we are looking for) in the window that is needed for this study.  We can use directEDGAR’s 10-K Filing History archive to establish whether or not a company has filed 10-Ks and for what period.    Our client had a list of approximately 13,000 CIK -YEAR observations  which represented 3,862 unique CIKs.  I used their list of unique CIKs to create a request file to determine the 10-K availability for their sample.  This file helped me in two ways.  First, for some of their sample CIK-YEAR pairs the date they were trying to collect data for was after (or before) the last (or first) date of the 10-K filings.  For example, they needed something from a 10-K filed by CIK 737644 after 1/1/2001.  The problem is that this CIK filed their last 10-K in 1997 (I determined this by using the 10-K history file results).


They can use the result file to determine if there is a 10-K filing within the time span that they need to collect the data.  And even better – the process also creates a file called missing.csv (clever name) that listed the CIKs from the request file for which no 10-K filing had ever been filed.  There were 477 CIKs from their original list of 3,862 CIKs that had never filed a 10-K.

So while we could not use directEDGAR  to establish if any in their sample did not have publicly traded equity we could use it to establish whether they filed any 10-Ks and also for what period.  The advantage of doing this work at the beginning is that we can more precisely define the data we should expect to collect.




Auditor Tenure Fast Collection

One of the projects we have been working on is enhancing our audit fee data.  Frankly the current presentation is lousy and not terribly useful.  So we sat down and developed a plan and identified some additional fields we needed to collect to include in the audit fee data as part of improving the value of this data.

One of the fields we need to add is the tenure.  For those of you not aware there was a rule change promulgated by the PCAOB in 2017 that required the tenure (auditor since) to be disclosed and has normally been included in the 10-K or Exhibit 13.  Since the disclosure is required it is much more standardized than the instances of auditor tenure disclosed voluntarily in the Proxy – the most common form is “have served as the Company’s auditor since YYYY”.

I wanted to collect that data myself so I could describe the process to our data team and set out to do using our Search, Extraction and Normalization Platform.

First – I needed to search for the phrase auditor since


As you can see I found 4,979 instances of that phrase (my universe was all 10-K filings that have been made since 1/1/2019.  Next I need to extract and normalize the phrase and convert any numbers after the phrase “auditor since” into the data value.  I used the ContextNormalization feature as you can see in this next image:


The Extraction Pattern translated into English means – extract the context and if a number is found following the phrase auditor since place it in the csv file in the column labeled audit_tenure.

So I invested a total of maybe 5 minutes.  So lets look at the results:


The context is available for review and the application normalized the context to extract the year value for our use to add to our new audit fee data.

There are some exceptions that had to be manually handled (110/4979).  Lets look at those:


As you can see – these registrants deviated from the standard disclosure and so I had to review the context and just key in the year value.  I am very comfortable working with our tools and in this context so it only took me about 20 minutes to review and key those missing values.

In total I spent roughly 45 minutes to capture this data value.  I spent about that much time trying to sort out how to describe the process in this blog post.    (When we upload the new audit fee data presentation the tenure field will actually be labeled as SINCE).



Version Release Coming!

2018 has been a busy year.  If you look back you will see that we added new data tables (Insider Trading, 10-K Filing History and Form D data.  In May we released 4.0.4 which was the foundation for allowing us to add the additional data tables as it significantly speeded up the delivery of the artifacts we process.  We also started adding the AGE and SINCE variables to director compensation and have been reworking our beneficial ownership tables to better deliver the data when a filer has multiple classes of stock.  We were able to move our extraction of the Effective Tax Rate Reconciliation table to a near real-time delivery rather than batch updates.

But it doesn’t stop.  When we deliver the filings and indexes for the last 2018 filings we will be including a new version ( of our application.  There are some important improvements coming with this version.

We added a ZOOM box for you to use to build your search phrases.  Our search engine can parse really large (think more than 32,000 character) search phrases.  While you can build the search phrase in Notepad or a similar application we decided to add a bigger box to use in the application.


We added a feature to allow you to identify/specify tables using specific words or values.  A key feature of our platform has always been that you can extract tables from the search results and then manage the data extraction from those tables.  One constraint that was imposed by that strategy is that all the tables across all the documents had to have some consistent phrase/term or value.  There are though cases where the registrants report the data in a unique fashion and so it was difficult to actually access tables.  So we developed a process that allows you to review a set of search results and then specify for each document a specific unique value the application will then use to identify and extract the relevant table(s).


We set up the foundation to quietly handle corporate actions that lead to a change in CIK but not the entity.  This one is a little complicated to explain so bear with me.  If you access most of the leading data sources for financial data and collect data for Alphabet – the time series of Alphabet will extend back to the first 10-K filing made by Google in 2005.  However – if you go to EDGAR and use the CIK that is returned from the financial data service you pulled the data from the first 10-K filing made by Alphabet was filed in 2016.  Because of a reorganization and a merger Alphabet became the successor issuer to Google but they have different CIKs.  With our new update when you submit a CIK for one of those companies (either Google’s 1288776 or Alphabet’s 1652044) our application will quietly add the complementary CIK when you select the new option Include Historical CIKs.  This feature has been added to every menu item that allows you to specify the CIK.


When you first install the application will add a special file that maps the CIKs.  However – because corporate actions that lead to this phenomena continue we have added a control in the Options menu that allows you to update the mappings at your convenience.


Another important change we added was to improve the usability of the application overall by adding keyboard shortcuts for every single menu item.  In the earlier versions of our software we only had keyboard shortcuts for the most used features – now every menu item can be accessed through the keyboard (without use of the mouse).  For example – to access the Search control it is only necessary to press Alt+Z + Tab and you are ready to key in your search phrase.

We also fixed some minor bugs and tweaked the licensing validation process a bit.  In earlier versions if our delivery server was occupied and you started the application the application would wait until the server was free before completely starting.  That hesitation should be gone now.



New Data Type 10K_HISTORY Coming Soon

I was doing a demo with a prospective client last week and had a typical experience – I submitted a request file for director compensation data for 2008 that the client supplied.  They had identified a sample of registrants and wanted to see how to access data related to their sample.  I used the file while they were watching and pulled the data – unfortunately the missing-cik-year report listed 67 CIKs for which no data was available for 2008.  I am glad we provide this summary so our users don’t have to muck around and identify which of their sample is missing.  Of course the next question is why are they missing?

Usually I will take them on a tour of EDGAR for a few of the listed CIKs and show that their are no filings for the time period.  The first listed CIK in the list of missing values was 3133 (AMSOUTH BANCORPORATION).  The reason why director compensation for AMSOUTH for 2008 is not available is readily apparent from this image


Our potential client observed that they understood but would like a more concrete way to establish whether or not data should be available.  I absolutely understood that and this issue has been bothering me for a while.

One alternative we considered was to try to find all of the delisting notices (15-12B) and create some summary of data from those filings.  Unfortunately – too many registrants do not actually file a 15-12B.  Further – there is another problem – sometimes data is missing because the registrant has not yet registered or even if they have registered they may not yet be obligated to file the reports that contain some of the data objects our clients are trying to collect.

The solution that we have settled on for the time being is to create a summary file that lists for every CIK that has ever filed any form of the 10-K the date of their first 10-K filing and the date of their most recent 10-K filing.  We have done so and uploaded this data to our distribution server.  We are calling this data type 10-K_HISTORY it should be visible in our ExtractionPreprocessed data window before 12/1.

Because this is a snapshot at a point in time we are setting this data up with an RDATE of 20180101.  This means that your request file will have to have the value of 2018 in the YEAR column for every CIK you want to check.

When you submit the request file we will return a results file that includes the following headings:


Note – the reference to FIRST_FILING and RECENT_FILING are specifically references to any form of a 10-K filing (10KSB, 10K405 etc).  So they are not really the first filing the registrant made on EDGAR.  The balance sheet date values are the balance sheet date that the filing covers.

We hope this makes it easier to understand why you might be missing data.  So rather than having to inspect EDGAR for relevant dates you can use your missing report to construct a request file to then check for missing values.  Here is a screen shot after testing the process with the results I alluded to at the beginning of this post:


We hope this makes data validation more efficient and less painful.  Since each CIK has only one row of data this should be quick data to access and act on.

Of course there are always catches.  I had 67 observations that were missing data for 2008.  I submitted all 67 CIKs and the results included another missing report for 5 CIKs.  Well it turns out that these 5 CIKs have never filed any form of 10-K.  For example, one of the CIKs belonged to COCA-COLA EUROPEAN PARTNERS PLC (1650107).  They file 20-F and 6-K forms.  Another belonged to DROPBOX (1467623) – they just went public in 2018 and have yet to file a 10-K.