Brief Update and Data for your Ticker Display

The first release of our 4.6+ series of the application is code complete but we are still working on the documentation.  If you in-fact would like to have a pre-documentation release of this version send me an email and I will be happy to arrange it.  Documentation completion is still a couple of weeks away as we are juggling some other production issues right now.  Check the previous two posts for the most significant enhancements to this release.  Frankly I am quite excited about the ability to limit searches to a DATE-CIK pair.  This was trickier than we had hoped as the memory object that we had to create to manage the processing was something we had never worked with before.

At the University of Nebraska at Omaha we have a fairly grand entrance with a ticker display set up across from a student run coffee/snack shop (Stedman’s Cafe).  I was in a conversation with our IT Director about a year ago and we wondered together what it would be like to display compensation data in real-time.  This was a low priority  project so it took awhile but we finally went live with it in November.  When our servers process executive compensation data from a company in a list we maintain of the 1,500 largest companies by market cap we push the data to a url where the ticker software that powers our tape can grab it for display.  Our display and the software is from Rise – I suspect most of you use the same system we use as they seem to have most of the market.

Here are a couple of shots of some compensation data as it is streaming across the display



When we decided to do this – I think I was intrigued by the challenge more than anything else.  However, I was sitting in the cafe the other day and it was interesting to hear students talk around me about the compensation.  While I doubt students talk about this every day – the brief time I was in there it was the subject of conversation with two student groups sitting relatively close.  Some seemed awed that people could make so much.  They were also speculating on the nature of the business and the specific nature of the jobs based on the titles.

We only display the Company Name, PERSON-NAME, PERSON-TITLE, SALARY and TOTAL for the current year.  However there are more fields available and I think after this proxy season is over we will play around with doing some changes in compensation and the like.

Here is an image of the JSON form of the data that gets pushed out for access:


Cabot Corp is chemicals and materials company.  According to their 10-K “(Our) principal products are rubber and specialty grade carbon blacks, specialty compounds, fumed metal oxides, activated carbons, inkjet colorants, and aerogel.”  (Yes I got lost after rubber).  They filed their proxy at 5:24 on Friday 1/24/2020 and this data was then available by 5:50 (not that speed is hyper critical here).

Enough rambling – we have done all of the hard work including working with Rise to sort out how to package the data for display.  If you would like to display this data on your ticker – let me know and we will add it to your license at NO ADDITIONAL CHARGE.  It is interesting how expensive some of the data is that is supplied for these terminals.  You can actually display any of the fields in the JSON above and we can also add fields (cash total versus non-cash total . . .).

I really was surprised that this was registering with students and so I am delighted to make this available to all of our clients.



Teaser-2 Version 4.6 Event Study Like Filtering

Our date filtering has taken a giant step forward.  Previously you could filter a search by dates but you had to apply the same date filtering to the entire set of CIKs that you were using in your search.  Version 4.6 adds the capability to set a discrete date filtering window around a particular CIK-DATE pair.

You need to supply a CSV file with a CIK column and a MATCHDATE column.  The values in the CIK column need to be the CIK values for your sample – no left padding of zeros – just the integer form of the CIK.  The values in the MATCHDATE column need to be dates in the form of MM/DD/YYYY or M/D/YYYY (for our international users if the standard date form in your locale is D/M/YYYY – our application will expect that form – whatever date form your version of Windows defines as ‘normal’),

Here is an image of a valid input file – notice there can be additional data in the file – the columns do not need to be adjacent but they need to be clean in the sense that the CIK and MATCHDATE column headings need to match exactly (no spaces, upper case etc).  You can have multiple CIK-DATE pairs – in the image below I have two different MATCHDATE values for CIK 1800


Once you have a CIK-DATE file created – start the application check the Use CIK button in the CIK/Date filter section and then select the Set CIK/DATE File button below the search box.


The user controls to manage the selection of the source file will become active.


Use the Browse button to navigate to and select the file to use for input.  Notice in the Range Days area of the control you can specify the number of days before the date separately from the number of days after the MATCHDATE.  Thus you can have a lop-sided window (0 to +180) or a symmetrical window (-45 to +45).  You also have the option to match to the RDATE or the CDATE.  Once you have selected the options the Okay button becomes active, select it to update the application with your input.

Once you have fully defined your search you can hit the Search button.  The application will return only those documents that matched your search criteria, matched by CIK and were filed within the date range you specified for the CIK-MATCHDATE pairs you specified in the input file.


I want to observe that the search time represents filtering through a base document collection of over one million documents to filter down to the 7,192 that matched my criteria.  Because this is research we understand how critical it is to be able to identify those CIK-DATE pairs that did not match any filings.  There is a View Misses button on the application that when selected will provide a list of those missing pairs.


Notice the Save as CSV button – selecting that will give you the chance to save these results to a file for manipulation, review or re-submission with a different span.  The CSV file will have the CIK and MATCHDATE columns.  The file will not contain any data values from your original submission file.






Teaser – 1 Version 4.6

We are finishing testing and working on the documentation for the next version of the ExtractionEngine.  We have added some new features that we hope will help with your research.  I will share these as we finish the testing and the documentation for each of the features.

The first item we have completed work on is that we added the ability for you to extract the text only from your search results.  While there is an increased interest in using text mining tools and performing sentiment analysis one blocker has been that most of the text mining software assumes/requires that the input text be in the form of plain text content.  Since most SEC filings are in html our clients have observed that it is painful to extract the text from a set of documents.

We added a new item to the Extraction Menu DocumentExtraction TextOnly.


If you select that option a new UI will start and give you the option to specify the destination directory where you would like the text form of the documents to be saved.  Once you have specified the destination directory and hit the Okay button text only copies of the original documents that were returned from the search will be placed in the specified directory.  Each document will be named in the CIK-RDATE-CDATE-~ form so you have an audit trail back to the source document.  Here is an image of a directory with a collection of filings that have been converted to text.



Here is a partial image of an 8-K filing in html format:


Here is an image of the same document after conversion to text:


Since we presume that these txt files will be used as inputs to some additional processing system we have not added line breaks.  The only line breaks in the file are those indicated by the html code so they are a bit painful to read.  However, all of the text is present.



Update to Audit Opinion Work

We have now completed the end-end linkage to make the audit opinions available for download automatically.  Specifically we have connected all of the code pieces that were required to identify the 10-K filings made by Large Accelerated Filers, extract the audit opinion from the filing and then push it to the server that handles your requests when submitted.  Because of the nature of the resources we are using these reports are currently available around 3:00 AM the morning after they are filed. That is if any are filed on Monday – the audit reports should be available to you at around 3:00 AM on Tuesday.

We are also working on providing a summary of the Critical Audit Matters described in the opinions as well as additional meta-data about the opinion.  If you would like to review the current status of this summary detail please follow this link (CAM_SUMMARY).  Below is an image of the current data columns in this file (columns A-E have meta-data that can be used to tie these details back to the actual 10-K filing where the opinion was found


We are specifically identifying the nature of the type of audit opinion (REPORT_TYPE) where the CAM are described (FINANCIAL or COMBINED where COMBINED includes both the opinion on the financial statements as well as the opinion on internal control).  We are also indicating the nature of the opinion and if an exception is noted then it will be reported in the EXCEPTION column.  CAM_COUNT is an integer that indicates the number of CAM listed in the opinion.  And then finally we are currently listing the title the auditor uses to describe the CAM.

Here are some fun facts (only accounting researchers/faculty would call these fun facts) from our initial analysis:

Distribution of CAM counts by registrant:


Five audit reports made mention of 4 CAMs, and 20 listed 3.  Here are the five registrants whose audit reports listed four CAM:


The most common CAM address measurement and impairment issues related to intangible assets (Goodwill generally but also the allocation across various types of intangible).  Both tax and revenue issues are also very prominent in the collection.  I think what surprised me the most is that there are mentions of inventory valuation issues in eight of the opinions.  I am not sure why I am surprised by this as our profession has roots in several frauds involving massive inventory misstatement.

We are isolating the CAM with the plan to create a separate file for each CAM mentioned in an opinion.  We have some ideas as to why this separation will be useful – but more on that later.  We are also developing a standardized taxonomy with the intention of delivering both the original language of the CAMs as described by the auditor and a more parsimonious description that would make using these as inputs to empirical models a bit simpler.  To make this a bit clearer – here are four descriptions of Goodwill Impairment Assessments

Goodwill Impairment Assessment – Cortland U.S. Reporting Unit
Goodwill Impairment Assessment
Goodwill Impairment Assessment – Adtalem Brazil Reporting Unit
Goodwill Impairment Assessment – Company-Owned Reporting Unit

A scan of these suggests these are all similar in nature and could be standardized to allow better sorting and coding.



Working with the CAM Audit Reports

A few days ago I posted that we had accomplished our goal of making the audit reports available for large accelerated filers on a timely basis.  I promised in that post to describe how to use those audit reports with our application.  In this post I describe those steps.

First you have to access the audit reports.  This is accomplished by submitting a request file with the CIK, YEAR and PF of the source document (in this case the 10-K).  Since the requirement to disclose CAM only applies to large accelerated filers with FYE ending after 6/30/2019 it could be tricky.  Here is a link to the latest model request file which lists the CIKs of all registrants who meet the criteria and who have released a 10-K since the implementation date (CAM_REQUEST_LINK).  For more direct instruction on how to download the actual audit report please review this blog post.  For this post I will begin assuming you have already downloaded the audit reports.  If you need those instructions – visit this post.

When you have finished downloading the audit reports they will be stored in the directory you specified in the ExtractionPreprocessed user interface.  Each audit report is an independent document and is named CIK-RDATE-CDATE-F##-22.htm. Where the CIK is the Central Index Key of the issuer.  The RDATE is the date the 10-K filing was made available on EDGAR. The CDATE is the balance sheet date.  The two digits following F are the last two digits of the original filing accession number and the 22 represents our internal text artifact number for the audit reports.


If you want to review these documents individually – select the SmartBrowser feature on our software and navigate to select the directory that has these artifacts.  The SmartBrowser will load the list of files and provide a list of the CIKs that are present in the left panel.  You can select any individual CIK to review that document or begin at that particular point to move forward.


However, if you would rather use the full features of our platform with these opinions you need to index them.  To prepare for indexing you need to create a new directory on your computer where the audit reports can be saved and the index can be created.  In this example I am creating a new directory in F:\myTemp\DEMO_CAM_INDEX.  Once the directory has been created select Create Index from the Indexing item on the menu bar


When the Create Index panel loads simply select the directory that has the original audit reports as the Source Files Directory to Index.  Select the directory you created for the destination as the Destination Directory and then select the Create Index button.


The indexing process will begin – there will be some messaging as the indexing progresses – including the file that is being processed and at the steps when the index has to pause to save the partial indexes.  When complete the application will report Indexing Complete.


When you hit the Close button the application focus will switch back to the main component.  However the active Index Library will switch from the library you were using to the directEDGAR_Custom library.  Your newly created index will be the last index in the list of custom indexes.


Select the index and start searching – you can now use all of the features of our application that are applicable.  For instance – suppose we want to identify all audit reports that mention revenue recognition:  a great initial search would be “revenue recognition”


As you notice in the screenshot – we have not injected any of our standard meta-data into the audit reports yet – that is why the company name is not yet displayed in the search panel.  We will make the code changes so this happens automatically in the next two weeks.  When that is done we will re-do these initial audit reports so the name is visible in the search panel and is reported in the Summary or Context extractions you might run.

To give you a quick example of using our platforms broader feature set with these artifacts I will quickly walk you through the process of extracting and normalizing the auditor tenure.  I know that tenure is generally reported with language similar to We have served as the company’s auditor since YYYY.  To find that language I am just searching for auditor since.  However, before I do that I am going to adjust the span of the ContextExtraction to five words – I want to minimize the noise in the output.  From the File menu select Options. . .


From the Options panel – Context is the first item.  Select Item and then replace the current value with the number 5 and make sure the Words radio button is selected.


Once you have adjusted the span – hit the OK button and then from the Normalization menu item select ContextNormalization.  There are three parameters to specify.  First, since we are working with the search results displayed in the application select the radio button next to Current Results.  You also need to specify an output directory – two files will be created and saved in the directory that you specify/select.  Finally you need to describe the nature of the normalization.


In this case we need the number that follows the string auditor since and we want to save it in a column with the heading tenure in the results.  When you have specified the parameters hit the Okay button.  When the application closes the ContextNormalization panel there will be two new files in the folder that you specified.


The FileToProcess.csv has the context from the search – the file with the date-timestamp appended has the results after the context has been normalized.


As you can see in the image above – the auditor tenure has been normalized from the context and is reported in the column cleverly labeled tenure.

We are in the midst of revamping our audit fee data and will be adding tenure as a field to that data (as well as the auditor name).  We will also work on linking the fee data with the audit reports so stay tuned for more updates.







Audit Reports with Critical Audit Matters Now Available to Download

As many of you know the PCAOB mandated that the auditors of large accelerated filers with fiscal years ending after 6/30/2019 include a description and other facts about Critical Audit Matters in the audit report.  These started becoming available in July.  We began overhauling some parts of our infrastructure to parse out these reports and make them available for direct download from our platform.  The initial work has been completed and these are now available.

Here is a screenshot of the Critical Audit Matters section of Apple’s audit report from their 10-K filed last week.  If you are not familiar with our application the audit report is displayed in our SmartBrowser – which allows our users to to review htm and txt documents with intuitive features to advance through a collection of documents.


These are interesting reading and I suspect there are some great opportunities for research with these reports.  I will note that I have already used some of the discussions from some earlier ones to help me make some points with my Intermediate Accounting classes.  There is something really salient for students when they read about the challenges of auditing revenue for a company that has multiple performance obligations and has to recognize revenue over time.

You should be able to see these reports listed in the artifact list when you create a request file – they should show up as the last item in the list of Text Sections list of artifacts.


If you have a properly organized request file, select AUDIT_REPORT from the ExtractionPreprocessed menu and hit the Okay button our server will process your request and deliver these snips to your desktop.

Of course the immediate problem is identifying those companies that are Large Accelerated Filers who have a FYE after 6/30/2019.  To make this step a little easier we will maintain a list of filers who meet that criteria and make it available to you as we update the archive.  A current list that is organized as a request file is available here.

Thus, if you save the request file you can then use it with the ExtractionPreprocessed feature to download these audit reports.  Right now we are still running the code on a batched basis.  In the near future we will automate this so these reports are available within 15 or so minutes after the source document (usually a 10-K) has been filed.

In the very near future I will make a new post that describes how you can use the built-in indexing engine of our application to build indexes of these documents so you can search them for relevant content.  Here is an image of one of my tests when I was searching for revenue recognition as a critical audit matter


Again – I will provide more information later – but this feature is now live



Amazing Research Deserves Amazing Rewards

A while ago I was searching for a way to acknowledge researchers who cite directEDGAR as part of their data collection efforts.  I wanted to do something because I think that those citations in the academic journals are a key factor is establishing our legitimacy with the academic market.  So we started a program (that I at least think is neat) where we order up ice-cream from a local ice-cream shop and have the cartons customized with the name of the paper and the names of any of the authors who are our clients.  – Check out the images below.


The image above is the ice-cream delivered to Professor Lubomir P. Litov, he is a member of the Finance Department at the University of Oklahoma.  His (co-authored) paper Lead Independent Directors: Good governance or window dressing? was accepted by the Journal of Accounting Literature for their December 2019 issue.

Here is an image of the ice-cream we sent to Professor Matt Ege, Jennifer Glenn and Professor John Robinson, faculty and PhD student in the Department of Accounting at Texas A&M University.  Their paper Unexpected SEC Resource Constraints and Comment Letter Quality was accepted by Contemporary Accounting Research (CAR) in May – though there has not been a publication date reported yet on the CAR website.


One more, one of my colleagues at the University of Nebraska at Omaha, Professor Erin Bass had an ice-cream surprise recently.  She cited directEDGAR in her paper Top Management Team Diversity, Equality, and Innovation: A Multilevel Investigation of the Health Care Industry recently accepted for publication by the Journal of Leadership & Organizational Studies.

While this may seem a little strange.  Who would make ice-cream with an academic research title on the label?  I will observe that one of our authors reported a couple of years ago that it really made their kids want to read the paper with the title.  They (the kids) were bragging to all of their friends that their dad had ice-cream named after him!!  What could be better than that?

A little back story on eCreamery, the amazing company that handles all of the details.  This company was started by two women in Omaha in 2007.  They have been on SharkTank and Mr Buffett (that’s Warren, not Jimmy) has been known to pop in with some of his best buds.  eCreamery is only about 15 blocks from my house and when we go there we can count on the line being out the door.  Clearly – the best SEC filing search engine customers deserve only the best ice-cream.

So for the fine print.  The decision as to whether or not we send ice-cream as the result of a citation is strictly at our discretion.  There are times we do not send ice-cream – none of the authors is employed by one of our clients is a big one.  Another is that the authors are at one of our international schools or in Alaska or Hawaii.  (eCreamery will not guarantee a frozen delivery outside the continental US – go figure).  This program can be stopped at any time.