Data Delay – Interesting Problem

I was fully expecting to begin making the Director-Relationship data available by now. However, we have run into some really interesting problems that we are having to sort through. We made an assumption that there was a one-to-one relationship between a Central Index Key and a person when a person has a SEC reporting obligation. However, as we were aggregating our director data to organize it for the relationship data presentation our data guru (Manish Pokherel) discovered this was not true.

Manish was trying to create various integrity tests before we made the final merge and in one of the scenarios he tested he discovered that there are approximately 40 people who have multiple CIKs. Here is a screenshot of the SEC landing page for Dr. Glimcher (who was on the board of Bristol-Myers Squibb from 1997 to 2017).

Gimcher SEC Landing Page

Clearly these look to be the same person – if you follow the links and read her biographies in the related filings it becomes clear that yes, Dr. Glimcher ended up with two unique CIKs.

The problem is that we have one CIK associated with some instances of her compensation (and ownership data) for some filings and the other CIK associated with other instances. For the compensation data and the relationship data to have the most value we need to standardize it.

The decision we made last night is that we are going to use the most recent CIK of these individuals. This means we have to go back through the compensation data and replace any instances where the older CIK value is included as the PERSON-CIK. I will observe that other cases of this are not as clear cut as Dr. Glimcher’s.

This has really been an interesting exercise. This is the first time we have pulled all of our compensation data at one time and tried to do some deep analysis. All of our previous integrity analysis has focused on one individual company and a fairly limited time series at a time. We have over 69,000 unique directors identified (NAME-PERSON-CIK). So as you can imagine it is a special challenge to find ways to cross validate the data.

Bottom line is we need to do some more testing – not too much more but we are still trying to identify ways to make sure this resulting data is clean. We also have to sort out how to make sure we propagate a specific CIK for a person through our system. I want to make sure that when you download our ownership transaction data, director votes data, our beneficial ownership data (I can’t remember where else we use the PERSON-CIK) you get clear links across time and between entities.

Complete Redesign of directEDGAR’s Delivery Modality – 10 Beta Testers Needed

Yesterday we inaugurated our first substantial test platform for our new delivery system. After the January 2021 update there will not be any more updates delivered through the mail. Instead we are using some technology from Amazon to host and deliver our application in the cloud. We expect to begin transitioning clients to this platform on a voluntary basis early in the 4th quarter and while we will make the final 2020 update available to allow those that want to transition a bit slower – that will be the final update.

The AWS Appstream service is amazing and provides us the opportunity to improve the timeliness of filing access (think of near immediate) as well as relieve your IT staff of their responsibility for local management of directEDGAR. The best thing about this change is that we do not have to impose any of the limits that come from a web based search. We can continue to provide the absolute best search experience and make the platform available to you anywhere – anytime.

I am looking for 10 beta testers from our users. If you are interested please send me an email. As a beta tester you can help us make sure we have some good user feedback on the experience. I will note I am already excited about this experience – searching and rendering is about 40% faster than on my local computer. The people who will get the most out of this experience initially will be those who need access to the most timely 10-K and DEF 14A filings. I expect the full directEDGAR content to be available by the middle to the end of this month.

We also need some advice or suggestions regarding the addition of metadata to our filings. Right now we include the SEC dissemination date (RDATE) as well as the conformed date (CDATE) and the item codes for each 8-K filing. We include word counts and issuer details like SIC code, FYE. We also provide doctype that drills down to the conformed exhibit code. I am intending to add the actual filing date/time stamp (most filings made on Friday after 5:30 have a Monday RDATE as they are not disseminated until 6:00 AM on Monday. I also want to add filer status (LARGE ACCELERATED, ACCELERATED etc) Is there anything else that you think should be added? If you remember – the metadata provides additional filtering opportunities. The idea is we can search documents for search phrases and words and then additionally filter on the metadata to provide even more focused results.

More Data (Insider Affiliations and Peer Data)

As I have noted before – when we process compensation data (as well as insider trading, beneficial ownership, director votes and other data that includes people we add the PERSON-CIK and SEC-NAME of the people that are the subject or are included in the data table. We do this so you can link people across time and entities.

However, I had some discussions with clients recently and they made a critical observation – this data would be difficult for them to create because to actually capture all companies a person is affiliated with they would have to dump all of our data and then organize it. Thus we decided to create a new data artifact. We are going to create an affiliation table that lists all other SEC registrants that each officer and director of a public company is affiliated with.

For example consider Roxanne Austin who is a director at a number of public companies. If you look at this summary of Ms. Austin’s SEC filings you can see that she has a reporting obligation because of her service on the boards of five registrants.

While her other directorships are described in her biographical information in the various proxy statements each of the biography paragraphs are formatted differently and there is some variation in the form of the various companies so this data is difficult to parse and convert to a useful form. Here is her biography from the DEF 14A of Abbot Laboratories. It mentions all of the companies listed above but it takes significant programming to link here to those companies. While the language is pretty clear (Ms. Austin currently serves on the . . .) this is described differently in other filings.

We have this data because we assign the PERSON-CIK to each director of a public company – even if they do not have a filing obligation (ownership interest) in the filer. We have wanted to organize this data for some time but have struggled with sorting out exactly how to deliver it in a way that is useful. Our initial effort is going to standardize this by calendar year. Clearly one of the challenges is that the fiscal year of these companies can vary and board service or employment can start or finish at any time. We decided that the best way to organize this is by issuer CIK and CALENDAR YEAR. Here is an image of what the data would look like after sorting it by PERSON-CIK. (I’ve left some columns out of this image).

This data should be available in the next two weeks. More details on how to access it will be posted once it is live.

We have been working on PEER data for a really long time. It has been a real challenge. We are finally making some progress, enough to describe it – not quite enough to set a date. This and Non-GAAP earnings reconciliations are some of the most frequently requested data objects. A number of our clients have or are using our tools to extract and normalize the Non-GAAP earnings reconciliations but the peer data is more complicated. Here is an image of the peer group listing for Abbott Laboratories from their 2018 proxy filing.

We faced two major challenges trying to identify this data and normalize it. One is that there is no indication inside the table (and that data is in a table) that these are peer companies. The second challenge is that the name is not very useful in the form that is reported in the table. We’ve finally developed some processes to help collect this data systematically. We are converting it into a form that will allow you to access the CIK of each peer.

I will post an update on this later.

Trying to join the modern age! YouTube Videos

I’ve been lucky recently to have a chance to talk to some clients. One of the thoughts I heard most frequently is that they want access to use/feature information in a more direct fashion. Thus, I have decided to start trying to create use videos.

The videos will show up in the sidebar. With the initial set I intend to illustrate the process of searching for, extracting and normalizing audit committee meeting frequency data. The first video is perhaps a little too long but my son Patrick was nice enough to say he thought I stayed on topic. I will try to keep these focused and try to directly address a specific question.

I have to do some experimenting – my goal is not to alert those of you who subscribed to this blog each time a new video is posted. Rather I hope the videos show up silently in the sidebar and when you have some time/need to explore one of our features you can visit our new YouTube Channel when you have a question or want to review some feature. Of course those of you that want to learn about directEDGAR’s great features so you can join our client base – feel free to watch those videos as well!

Operator Search (and, or & not)

I was working on the finishing touches for the process of uploading the initial CAM_FEE data – which should be live in the next 10-12 hours. I was specifically agonizing on the value we should use for the AUFEEYEAR field. There is such inconsistency in how SEC registrants refer to their fiscal years and thus how it the data is reported in their filings.

As part of my research I decided to run a search for the phrase ‘FISCAL 2020 and 2019’. I had forgotten that when we search using words that are also search operators (and/or/not) it is necessary to append a tilde (~) to the operator that we want to use as a search term. Without the tilde the term is treated as an operator. The original search phrase returns all documents with the phrase FISCAL 2020 and the number 2019 somewhere else in the document. However, with the tilde (FISCAL 2020 and~ 2019 my search returns exactly what I was searching for (all documents with the phrase ‘FISCAL 2020 and 2019’

To illustrate the issue I am concerned about – here is an image from the 10-K/A filed by TECH DATA CORP on 5/21 (their FYE is 1/31).

Williams Sonoma’s FYE is 1/29. However, in their audit fee disclosure they describe the fee payment for their most recent fiscal year end as payment for FISCAL 2019.

Since we need to link the audit fee disclosure and the audit opinion and since the audit opinion is included in a 10-K filing that has the balance sheet date as the last date of the fiscal year – the AUFEEYEAR value needs to be the YEAR from the balance sheet date. I hope that is not confusing but that gives us the most robust way to track the data between the two filings DEF 14A and the 10-K. If we didn’t do this then we could have some 1/31 filers with the AUFEEYEAR value reported as 2019 (see William Sonoma) and others with the value 2020 (TECH DATA CORP).

Update on Critical Audit Matter Disclosures and Audit Fee Data

Well I apologize for our delay with the update to the CAM data processing. The automation process that extracts the audit report from the 10-K and then posts it to our download server is working fine. The struggle has been to determine the best way to organize the data extracted from the audit report and how to distribute this data. We are undergoing final testing today and Tuesday and I expect to make this new data live sometime late Tuesday (5/26/2020) – as an aside I will mention that Tuesday is the favorite (and only) son of doting elderly parents’ birthday.

Before I ask you to dig into the nitty-gritty on how about to access and work with this data – use this link to access a small sample (CAM_FEE_SAMPLE). There were some unexpected issues (discussed next) that are important to understand as you begin to use this data.

The first unexpected issue was what to do about those cases when the registrant files an amendment to the original 10-K. This actually happened much more frequently than expected. We have compared the original audit report and the subsequent audit reports and thus far there have been no changes to the data that we have extracted from the audit report except for the auditor signature date. Therefore – our present strategy is to include the original audit report date as one of the fields. If there is a change in the conclusion of the audit report or a change in the critical audit matter disclosures we might have to add a second row of data to report the original as well as the new values.

The next unexpected issue was how to handle the cases when the registrant files audit fees in multiple reports and the audit fees do not match. The most common scenario for this is when the registrant initially includes the audit fees in ITEM 14 in a 10-K or 10-K/A and then reports new values (for the same period) in a subsequent 10-K/A or DEF-14A. We have decided to operate on the presumption that the most recent filing has the best measure of audit fees. I will observe that we have investigated every case where there were differences reported. There was only one case where the registrant reported that the amendment was filed to correct the audit fee disclosure.

The third major issue relates to those cases where the sum of the audit fees do not equal the total amount of fees reported. We handled these cases in two ways. If we could identify a reasonable explanation for the discrepancy – perhaps the total was off by the amount of one of the components we corrected the total. If we could not identify the reason for the difference we left all values as reported and added a field to the data table named ERROR. This value represents the difference between the sum of the components and the reported total. The following table contains the full list of data values that will be delivered if this data is requested from our system.

CIKIssuer Central Index Key.
RDATEThis is the balance sheet date for the 10-K filing as reported by the registrant. The year component of this date is the value to use in the request file for YEAR. For a 12/31/2019 FYE the RDATE will be R20191231. The YEAR value for the request file will be 2019.
CDATEThe CONFORM date for the filing that reported the audit fees.
FNAMELast two digits of the accession number of the filing that reported the audit fees.
TIDThe index of the audit fee table relative to all tables in the source document.  If the audit fee table was an image file this value is the image index.
AUFEEYEARThis is the fiscal year that the registrant reports in the audit fee table.
AUDIT FEESAudit fees as reported by the registrant.
AUDIT RELATED FEESThe total amount of audit related fees as reported.
ALL OTHER FEESThe total of all other fees.
TAX FEESThe total of the various tax fees as reported.
TOTAL FEESThe reported total of fees as reported in the source document.
ERRORThe unexplained difference between the reported total and the sum of the components.
10K_RDATEThe dissemination date of the 10-K filing that included the audit report on Critical Audit Matters.
10K_CDATEThe CONFORM (balance sheet date) as reported with the 10-K filing.
AUDITORThe name of the audit firm that signed the audit report.
LOCATION_CITYThe CITY location as reported by the auditor.
LOCATION_STATEThe STATE (or COUNTRY) as reported by the auditor.
AUREPORT_DATEThe date of the signature of the audit report.
SINCEThis value represents the year the auditor began working for the registrant.
REPORT_TYPEFINANCIAL or COMBINED to indicate whether or not the audit report includes the auditor’s opinion only on the financial statements or also includes their report on internal control.
OPINIONWhether or not the financial statements PRESENT FAIRLY the results of operations.
EXCEPTIf there is an exception reported in the audit report this field will report the language used by the auditor to describe the exception.
CAM_COUNTThe total number of Critical Audit Matters that were included in the opinion
CAM_1 – CAM_NThe auditor’s heading/description of the critical audit matters – these are reported in the order they were found in the audit opinion

The question now is how do you access this data after Patrick’s Birthday? Once we make the data live their will be a new field available in the Pull section of the ExtractionPreprocessed user interface. You will know that we have made the data live because you will see CAM_FEE_TABLES in the Data Tables Block.

Let me make sure that the parsed data in this form will only be available after the registrant has filed audit fee data. Until the audit fee data is available you can still access the actual audit reports that have been parsed from the 10-K filings. This form of presentation is meant to be entirely separate from the audit report access we turned on late last year. The year value to use in a request file for these reports should be the expected year of dissemination (filing) on EDGAR. So for a 12/31 firm the audit report for 2019 will normally be available in 2020.

Note that we included the original 10-K dissemination date 10K_RDATE as a field so you can run an event study with this data. We also included the fields 10K_RDATE and 10K_CDATE so you can match the data values to the actual audit report if you choose to pull the audit report. These fields also allow you to very quickly create a request file for the audit report. Use the CIK, the year parsed from the 10K_RDATE field from this new table and add the column PF with the value FY.

Of course the critical question is still Large Accelerated Filers. After communicating with some of our users I am convinced we need to make sure this valuable data point is available – I actually think we need to provide a new file type that lists the actual filer type (LARGE ACCELERATED, ACCELERATED . . .) accessible by CIK and YEAR. That has been added to the work list. In the meantime send me an email and I will let you have the most updated list of LAF that have filed since 6/30/2019.

This post is too long – I will wrap it up to note that when you request this new data type – the source file that is delivered is the original htm representation of the audit fee data. We normalized the audit fee disclosures – I know some of you want the more detailed fee components when they are disclosed. Those are easy to access by using the DEHYDRATOR-REHYDRATOR process on the source files.

Novel Coronavirus Disclosures

I hope all of you are well.  We’ve been working on staying safe and keeping our systems updated and working.  I actually was hoping to have a new post by now giving you a summary of our Critical Audit Matter data findings.  However, with a shift to remote learning at the University of Nebraska at Omaha, remote learning for my son who is a sophomore in high school (and the only child of doting elderly parents) and then the absolutely unprecedented increase in early proxy filings I’ve not had time to get back to that post.  However – the audit reports are still flowing and if you would like a summary of the reports please let me know.  As soon as things settle down a bit we will finish the transformation of our audit fee data and include the CAM details as additional fields in the audit fee data.

However, I was talking to one of the team today who suggested that I do a quick check to see what had been disclosed about the Coronavirus in the filings that have been made since the first of the year.  So I did do a quick search of just 10-K and DEF 14A filings.  The DEF 14A disclosures mostly report on the prospect of switching to a virtual annual meeting.  As you can imagine the 10-K disclosures vary significantly in the level of detail and salience.  Here is a link to a summary extraction of the disclosures (frequency and company name as well as CIK (COVID.csv).

I was really interested in the nature of the disclosure made by Carnival Corp given the challenges they were facing with illnesses on their ships.  They filed their 10-K on 1/28/2020 (this is very consistent with their historic filing date for their 10-K).  Here is the extent of their disclosure:

Fiscal Year 2020 Coronavirus Risk
In response to the ongoing coronavirus outbreak, China has implemented travel restrictions. As a result, we have suspended cruise operations from Chinese ports between January 25th and February 4th, canceling nine cruises. We also expect that travel restrictions will result in cancellations from Chinese fly-cruise guests booked on cruises embarking in ports outside China. We estimate that this will impact our financial performance by $0.03 to $0.04 per share. If the travel restrictions in China continue until the end of February, we estimate that this will further impact our financial performance by an additional $0.05 to $0.06 per share. Five percent of our capacity was scheduled to be deployed in China in fiscal year 2020. If these travel restrictions continue for an extended period of time, they could have a material impact on our financial performance.

Here is the disclosure from American Airlines (2/15/2020)

  In particular, an outbreak of a contagious disease such as the Ebola virus, Middle East Respiratory Syndrome, Severe Acute Respiratory Syndrome, H1N1 influenza virus, avian flu, Zika virus, coronavirus or any other similar illness, if it were to become associated with air travel or persist for an extended period, could materially affect the airline industry and us by reducing revenues and adversely impacting our operations and passengers’ travel behavior. For example, the coronavirus outbreak that originated in or around Wuhan, China in January 2020 has resulted in the widespread suspension of commercial air service to the region, including by American, as well as the imposition by the U.S. and other governments of significant restrictions on inbound travel from this region. Our suspension of service, which remains in place as of the date of this report, and the potential for a period of significantly reduced demand for travel has and will likely continue to result in significant lost revenue. As a result of these or other conditions beyond our control, our results of operations could be volatile and subject to rapid and unexpected change. In addition, due to generally weaker demand for air travel during the winter, our revenues in the first and fourth quarters of the year could be weaker than revenues in the second and third quarters of the year.


These are interesting given the significant impact these companies have experienced as a result of this outbreak.  I’m not intending to evaluate the quality of the disclosure – that remains for an empirical study – I am just surprised at the brevity of the disclosure.

Since I had the disclosure counts, I was curious about the frequency of disclosures across time.  Below is a graph of the average count of the frequency of COVID or CORONAVIRUS mentions in 10-K filings by date.  The interesting feature of this pattern is that it at least suggests that this was not a significant concern to the management of this sample of firms until after March 8.  So the first disclosure was made by Carnival on 1/28 with two instances of either word.  While the average disclosure frequency doubled by 3/2 – this does not seem like a significant uptick relative to what happened soon after.  As an aside – the firms who disclosed before 3/3 were mostly Large Accelerated filers (the filing deadline for 12/31 FYE LAF was 3/2 this year).  So the uptick in disclosure counts reflects more mentions in the 10-Ks of relatively smaller firms


I also mentioned in the first paragraph that we have seen an uptick in early proxy and 10-K/A filing ( registrants will sometimes disclose information required by Item 10 – Item 14 in an amended 10-K if they are uncertain if they will be able to file their proxy in time to meet the 90 day deadline after they have filed their 10-K.  In total our executive/director and audit fee extractions are running about 12% higher then this same time last year.  Most of the increase seems to be in 10-K/A filings rather than a real increase in proxy filings.


Brief Update and Data for your Ticker Display

The first release of our 4.6+ series of the application is code complete but we are still working on the documentation.  If you in-fact would like to have a pre-documentation release of this version send me an email and I will be happy to arrange it.  Documentation completion is still a couple of weeks away as we are juggling some other production issues right now.  Check the previous two posts for the most significant enhancements to this release.  Frankly I am quite excited about the ability to limit searches to a DATE-CIK pair.  This was trickier than we had hoped as the memory object that we had to create to manage the processing was something we had never worked with before.

At the University of Nebraska at Omaha we have a fairly grand entrance with a ticker display set up across from a student run coffee/snack shop (Stedman’s Cafe).  I was in a conversation with our IT Director about a year ago and we wondered together what it would be like to display compensation data in real-time.  This was a low priority  project so it took awhile but we finally went live with it in November.  When our servers process executive compensation data from a company in a list we maintain of the 1,500 largest companies by market cap we push the data to a url where the ticker software that powers our tape can grab it for display.  Our display and the software is from Rise – I suspect most of you use the same system we use as they seem to have most of the market.

Here are a couple of shots of some compensation data as it is streaming across the display



When we decided to do this – I think I was intrigued by the challenge more than anything else.  However, I was sitting in the cafe the other day and it was interesting to hear students talk around me about the compensation.  While I doubt students talk about this every day – the brief time I was in there it was the subject of conversation with two student groups sitting relatively close.  Some seemed awed that people could make so much.  They were also speculating on the nature of the business and the specific nature of the jobs based on the titles.

We only display the Company Name, PERSON-NAME, PERSON-TITLE, SALARY and TOTAL for the current year.  However there are more fields available and I think after this proxy season is over we will play around with doing some changes in compensation and the like.

Here is an image of the JSON form of the data that gets pushed out for access:


Cabot Corp is chemicals and materials company.  According to their 10-K “(Our) principal products are rubber and specialty grade carbon blacks, specialty compounds, fumed metal oxides, activated carbons, inkjet colorants, and aerogel.”  (Yes I got lost after rubber).  They filed their proxy at 5:24 on Friday 1/24/2020 and this data was then available by 5:50 (not that speed is hyper critical here).

Enough rambling – we have done all of the hard work including working with Rise to sort out how to package the data for display.  If you would like to display this data on your ticker – let me know and we will add it to your license at NO ADDITIONAL CHARGE.  It is interesting how expensive some of the data is that is supplied for these terminals.  You can actually display any of the fields in the JSON above and we can also add fields (cash total versus non-cash total . . .).

I really was surprised that this was registering with students and so I am delighted to make this available to all of our clients.



Teaser-2 Version 4.6 Event Study Like Filtering

Our date filtering has taken a giant step forward.  Previously you could filter a search by dates but you had to apply the same date filtering to the entire set of CIKs that you were using in your search.  Version 4.6 adds the capability to set a discrete date filtering window around a particular CIK-DATE pair.

You need to supply a CSV file with a CIK column and a MATCHDATE column.  The values in the CIK column need to be the CIK values for your sample – no left padding of zeros – just the integer form of the CIK.  The values in the MATCHDATE column need to be dates in the form of MM/DD/YYYY or M/D/YYYY (for our international users if the standard date form in your locale is D/M/YYYY – our application will expect that form – whatever date form your version of Windows defines as ‘normal’),

Here is an image of a valid input file – notice there can be additional data in the file – the columns do not need to be adjacent but they need to be clean in the sense that the CIK and MATCHDATE column headings need to match exactly (no spaces, upper case etc).  You can have multiple CIK-DATE pairs – in the image below I have two different MATCHDATE values for CIK 1800


Once you have a CIK-DATE file created – start the application check the Use CIK button in the CIK/Date filter section and then select the Set CIK/DATE File button below the search box.


The user controls to manage the selection of the source file will become active.


Use the Browse button to navigate to and select the file to use for input.  Notice in the Range Days area of the control you can specify the number of days before the date separately from the number of days after the MATCHDATE.  Thus you can have a lop-sided window (0 to +180) or a symmetrical window (-45 to +45).  You also have the option to match to the RDATE or the CDATE.  Once you have selected the options the Okay button becomes active, select it to update the application with your input.

Once you have fully defined your search you can hit the Search button.  The application will return only those documents that matched your search criteria, matched by CIK and were filed within the date range you specified for the CIK-MATCHDATE pairs you specified in the input file.


I want to observe that the search time represents filtering through a base document collection of over one million documents to filter down to the 7,192 that matched my criteria.  Because this is research we understand how critical it is to be able to identify those CIK-DATE pairs that did not match any filings.  There is a View Misses button on the application that when selected will provide a list of those missing pairs.


Notice the Save as CSV button – selecting that will give you the chance to save these results to a file for manipulation, review or re-submission with a different span.  The CSV file will have the CIK and MATCHDATE columns.  The file will not contain any data values from your original submission file.






Teaser – 1 Version 4.6

We are finishing testing and working on the documentation for the next version of the ExtractionEngine.  We have added some new features that we hope will help with your research.  I will share these as we finish the testing and the documentation for each of the features.

The first item we have completed work on is that we added the ability for you to extract the text only from your search results.  While there is an increased interest in using text mining tools and performing sentiment analysis one blocker has been that most of the text mining software assumes/requires that the input text be in the form of plain text content.  Since most SEC filings are in html our clients have observed that it is painful to extract the text from a set of documents.

We added a new item to the Extraction Menu DocumentExtraction TextOnly.


If you select that option a new UI will start and give you the option to specify the destination directory where you would like the text form of the documents to be saved.  Once you have specified the destination directory and hit the Okay button text only copies of the original documents that were returned from the search will be placed in the specified directory.  Each document will be named in the CIK-RDATE-CDATE-~ form so you have an audit trail back to the source document.  Here is an image of a directory with a collection of filings that have been converted to text.



Here is a partial image of an 8-K filing in html format:


Here is an image of the same document after conversion to text:


Since we presume that these txt files will be used as inputs to some additional processing system we have not added line breaks.  The only line breaks in the file are those indicated by the html code so they are a bit painful to read.  However, all of the text is present.