The Numbers Don’t Add Up!

Today at 10:59 ET Brown-Foreman (CIK – 14693) filed their proxy.  I was particularly happy because  we had a call scheduled at 11:30 with a potential new client and I find it keeps their attention when we can demonstrate to them an example of our processes working with real-time filings for companies that they are familiar with.  Given Brown-Forman’s status as a Fortune-1000 company I was chortling to myself that this was perfect.

Unfortunately it was not as perfect as I would have liked.  Soon after the filing was made I received a notification that there was a TOTAL error in the Executive Compensation table.  Here is the original table.


We run several validation tests on the data as we are parsing the document – one of those is of course that the totals tie with the reported total.  There are a variety of edge cases where the total may not match because of a small mistake by the registrant.  One example we see is that a registrant may accidentally insert a period instead of a comma.  We flag all math errors for human intervention.

In this case we can’t identify the reason for the error.  The components of the 2017 compensation reported for Mr. McCallum sum up to 2,518,874.  The reported total is 2,578,874.


Our next step is to scrutinize the filing to see if there is some discussion of the $60,000 amount.  We couldn’t find anything so now we send off an email to their Investor Relations Department and wait for a response.  If we don’t get a response within a week we will push the table as is.

These kind of addition errors happen infrequently – but then we discovered another one about half an hour later when ALJ Regional Holdings (CIK – 1438731) filed their proxy while I was on the call with the client.   In this case the error was present in two years of data (2016 and 2015).


The interesting thing is that when they filed their proxy last year – the reported total for Mr. Reisch was $785,250 – which is the sum of the reported components.  However, they are now reporting $804,000 as the total for 2015.  Which document is correct?  As with Brown-Foreman we sent off an email and so I hope we find out soon.

The potential client was impressed that we had the infrastructure to address these issues.

Where is that Comp Data?

One of the things we pride ourselves on is what we think is the fastest and most comprehensive delivery of Executive and Director Compensation data on the planet (a little bit of hyperbole never hurts).  In an attempt to focus on that we have been working to add a modal window on our website so that a user who visits our prime website will see the most recent comp table we have processed.

I had an interesting question the other day from a visitor who wondered why we were displaying the compensation data from SPRINT in the middle of the day on 6/20 when four other issuers had reported EC data since Sprint filed their proxy at 5:00 pm on 6/19.  If we were so good where was that comp data?

Here is the sequence of filings:


So when I received the query early on the 20th I had to look.  We deliberately did not push out the DC/EC or Audit Fee data for those entities because of an interesting issue.  MITCHAM INDUSTRIES and DOCUMENT SECURITY SYSTEMS INC  both filed a 10-K a couple of months earlier and both reported that data in their 10-Ks so our system flagged the new data as \SAME CONTENT\DIFFERENT DOCUMENT.

Here is the EC table from MITCHAM INDUSTRIES as reported in their 10-K on 5/31:


Here is the EC table from their Proxy filed on 6/20:


Here is the data from their 10-K after we normalized it and made it available on 5/31:


Further FUNDVANTAGE TRUST’s and CHROMADEX CORP’s DEF 14A related to a Special Meeting and did not include summary compensation data.  It was not until SPEEDEMISSIONS INC filed their proxy at 10:20 (CT) before we had new data.

We ultimately do not replace the existing data with the new data if it matches the content from a previous filing.

While it would be silly of me to claim we are perfect, I did dodge that bullet as we did have the most timely compensation data available.


History Feature

When I was working on the last post I realized I have never shown our Search History feature.  I use this all the time as it saves me significant time and effort when working with a complex search that I am refining.

The Search History is available from the Search menu under the (wait for this) History tab.  It pulls up all of the Search History from your local user id since the last time you cleared the history.  When you select Search History the Search History control appears:


You can modify any of the fields in this control, or you can select the Open Search button and all of the relevant fields of the application will be populated with these values – you can then modify as needed.




Critical Benefit of directEDGAR Architecture

Anyone who has written code to work with SEC filings understands how messy the filings can actually be despite their apparent standardization.  The 10-K is a form that the filer is supposed to fill out.  But when you start writing regular expressions to find and parse sections of these filings for large sample it becomes clear very quickly how much variation there is in the presentation and language of what is seemingly very standardized.

I am working to improve our extraction of ITEM 7 Management’s Discussion and Analysis of Financial Condition and Results of Operations.  Beginning in 1999 the SEC amended the 10-K to include Quantitative and Qualitative Disclosures of Market Risk.  The specific requirement states clearly:


Since this follows Item 7 – if we want to isolate Item 7 we need to find the beginning of Item 7A as the cut-off  for Item 7.

Based on the description above we should be able to write a straight-forward Regular Expression to identify this line.  While there are different flavors – we use Python and to keep it general we should write something like  ‘\n[ \t]*?ITEM[ \t]*?7A\.[ \t]*?Quantitative[ \t]*?and[ \t]*?Qualitative’,re.I.  I am looking for all parts of the document that are new lines that may begin with 0 or more spaces or table followed with the word ITEM, more unknown number of spaces and tabs the number 7 . . .  With the re.I I am allowing the case of the words to vary.  With this expression I am giving some flexibility even though I would not necessarily expect to have to account for any variability because we are working with a form not a document that someone creates from scratch.

Using the REGEX from above I was able to successfully identify the beginning of the ITEM 7A section of the 10-K in 79% of the sample.  That sounds wonderful, except for the one year I am working with I have  4,170  observations.  Thus, I still have to deal with another 750 observations.  So the question comes down to this – how am I going to determine the correct way to identify the beginning of ITEM 7A for 750 companies?

To determine the actual data presentation I am going to have to review the documents for which I know there should be an ITEM 7A section of the 10-K but could not find one with my original expression. It doesn’t matter how I expect the data to be reported – it only matters how the data is actually reported.  I have to look at the missing filings and sort out how the words they used to identify that section.

That is easy enough with directEDGAR.  Since we localize all the filings – I am actually working with a local file version of the 10-K and so using the code I can easily view the path to a missing observation.  Here is the first interesting variant I discover:


So instead of QUANTITATIVE AND QUALITATIVE this registrant decided to reverse the language in the form.  An interesting question is – well how many did that?  It turns out 95 did (roughly two percent of the total and 1/7th of the missing values).  This discovery leads me to account for this in my REGEX so it gets a bit more complicated – ‘\n[ \t]*?ITEM[ \t]*?7A\.[ \t]*?(Qualitative|Quantitative)[ \t]*?and[ \t]*?(Qualitative|Quantitative)’,re.I.  In this version I am asking to inspect for lines with QUANTITATIVE or QUALITATIVE followed by AND and then again QUANTITATIVE or QUALITATIVE.

Because I can trivially look at the 10-K documents with directEDGAR I can very easily identify the missing observations and modify my code as needed until the point of diminishing returns.  What I mean here is that I am interested in collecting data, not writing the perfect regular expression (I will leave that to others more skilled).  In one of the missing cases I saw this:


So Small Business Issuers are exempt.  This tells me that I should do a search using the Search Extraction and Normalization Engine to find those filings with the phrase not provided quantitative.  Because not is a search operator it has to be qualified to use it as a search word.  This is accomplished by appending a tilde (~) to the word so the search phrase I am going to initially use to identify those 10-K filings where the issuer claims an exemption is not~ provided quantitative.  Not a very fruitful search – only four filings (2 different CIKs).

My ultimate regular expression got very complicated (I even had to account for six different spelling mistakes).  However, the process to create it was not complicated because every time I had missing values, I could look in the search results on my screen and immediately zero in on the more challenging documents to learn how their disclosure was different.


If I were using some other tool it would have been more tedious because I would have to visit the SEC website, type in the CIK of the company I am missing and then locate the troublesome 10-K.  Certainly this is doable but it is taking my focus of my goal – collecting data for my research.



Questionable Value of SGML tag value in SEC Filings

I have posted before about our decision to use the RDATE rather than the ACCEPTANCE-DATETIME value from the accession-number.txt or accession-number.hdr.sgml files.  We use the RDATE (note that RDATE is a term we assigned to the date associated with the <SEC-DOCUMENT> tag in those same files).  The difference is critical because academic researchers need to identify the best event date for event studies.  Selecting the wrong date could at best introduce excessive noise into the calculation of abnormal returns and at worst could lead to a bias in the results.

While we have the ACCEPTANCE-DATETIME stamp for all the filings we use the date that is associated with the SEC-DOCUMENT tag because it is a better measure of when the filing was actually made available through the EDGAR filing system so a user could read and then act on the information.

To illustrate this I took a screen-shot of the Latest Filings Page with 10-K filings listed at about noon on Saturday April 8.  Here is the screenshot:


If I had tried to collect all 10-K filings made through EDGAR at noon on 4/8/2017 the most recent filing I would be able to access would be the 10-K filed by Plastic2Oil.  According to the SEC this was accepted at 17:30 on 4/7/2017 and has a filing date of 4/7/2017.  When this filing was added to directEDGAR about 4:45 (CDT) on 4/7/2017 we assigned the RDATE value R20170407.

I checked in again Sunday 4/9/2017 about noon and the same list was available.  Then I checked midday Monday 4/10/2017 and the list had been updated to reflect all the filings that were made after the SEC cut-off (17:30 M-F excepting Federal Holidays) (as well as filings made Monday morning).  Here is the updated list:


   There are seven filings that were not visible or accessible to EDGAR users until probably the first push of the RSS index at about 5:00 AM on 4/10/2017.  I checked our logs and see that we pulled those particular filings at about 5:15 AM on 4/10.  They were not available for our final pull Friday at about 9:00 pm and they were not accessible during our weekly clean-up run on Saturday where we validate everything that was filed during the week.

Our competitors insist up using the ACCEPTANCE-DATETIME value as the critical event date.  That has never made sense to us because of all of the issues that can affect the length of time between when the filing is submitted (ACCEPTANCE-DATETIME) and pushed to EDGAR users (SEC-DOCUMENT) date.  In this example the lag is caused by the SEC enforcing their cut-off rule.  However, the lag can also be affected by problems with the composition or organization of the filing.  That is the registrant can submit a filing and it fail a validation test.  The SEC may allow the registrant to retain the initial ACCEPTANCE-DATETIME value since that has regulatory consequences but the filing is still not available to the public through EDGAR until sometime after the filing has been corrected.

For the 10-K filed by Robin Street Acquisition Corp from the screenshot above the header reports that it was made at 19:12 on 4/7.  We assigned it an RDATE of R20170410.


Internally we refer to the RDATE as the Reveal Date.  We think it is a better value to use in an event study.  So then the question is how do you get the RDATE for a collection of filings?  Since it automatically included in the summary result set from a search that is not too difficult.  The following image shows the summary results file after  searching for the phrase “poison pill” in 8-K filings.


There is quite a bit more meta-data about each of the results but this shot lets you see that the RDATE is automatically parsed and ready (as well as the CONFORMED DATE – CDATE).

Using directEDGAR to Enhance my Teaching

I teach Intermediate I.  This is a tough class to keep students focused on the issues.  I believe one of the problems is that the students have little business experience and they have a hard time connecting most of what we are covering to real life business circumstances.  For example, last week we covered the measurement of inventories.  One important Learning Objective in this topic is for students to be able to demonstrate an understanding of what items should be included in the inventory at the end of the period.  Inventory (purchases or sales) in-transit as well as on-consignment have to be considered.  When I talk about consignment most students start tuning out because to them that doesn’t seem relevant.

I decided to check to find some concrete examples of disclosures relating to consignment issues in recent 10-K filings using the Search, Extraction & Normalization Engine.  The search illustrated below is for all documents that are 10-K filings and have some form of the word root consign.


We require the TI-BAII calculator so all of the students are at least familiar with Texas Instruments.  Their disclosure reports that 55% of their revenue comes from sales on inventory that is held on consignment by their customers.  The nice thing about this is it provides a nice segue into a discussion about how critical integrity is to successful business relationships.

Another interesting disclosure that students could relate to was made by Sirius XM Holdings.  Here is a screen shot of their disclosure:


Many students have satellite capable radios in their cars and like most people (including me) have probably not thought about the supply line from the auto manufacturer to Sirius and how that radio was sourced.  Again – this disclosure provides some substance for a more interesting class disclosure about what students have historically seemed to think is a real non-issue.  I could tell students were paying more attention when I related this to the problem we were working.

Finally, probably the most interesting disclosure I found about consignment sales was from Calavo Growers Inc (CIK: 1133470).  Here is their disclosure:


This was interesting for me as I would never have imagined that kind of relationship with respect to the sale of a perishable product.  Does this mean that payment to the grower for that bag of avocados I bought at Costco was dependent on my purchase?

While our platform is designed for intense data collection for research and analysis – one of the clear side-benefits is the opportunity to bring timely disclosures to class to link the issues covered to real world business problems.

Interesting Results from Director Voting

I was testing some of our preliminary results of the extraction and normalization of the results of Director Votes.  I struggled for a bit about how to present this.  Initially I was going to identify some voting results that were strongly negative with the names of the directors and the companies but I thought about it a bit.  I guess if John Smith is having some professional issue that is causing him to receive a high percentage of negative votes I don’t think I want his curious kids Googling his name and find our website recounting the struggles of their dad.  Thus the analysis below does leave out the names of the individual directors.  However, when you do download the actual director vote data we include their name as reported in the filing, their name as it is reported in their personal ownership filings and their personal CIK as illustrated in the image below:


(While names are included above – they are embedded in an image so I feel like we are not imposing on anyone’s privacy).

So the above image illustrates many of the details of our vote data lets at least superficially dive into some interesting sorts on the votes.  I should note that this was an initial sample of good results from 2,959 companies in the Russell 3000 who reported voting results in an 8-K from 1/1/2015 to 12/31/2015.

Most negative votes.  I decided to sort the vote results by the proportion that were negative (sum of AGAINST, WITHHELD and ABSTAIN) (AWA) divided by (sum of AGAINST, WITHHELD and ABSTAIN plus FOR).  I was sort of amused by this initial sort as their were two candidates for election to the board of General Motors who received a minuscule number of affirmative votes and more than 1.2 billion votes against.  Here are the results


Neither of the gentlemen listed were listed in GM’s Proxy.  It took a bit of research to sort out that these folks offer themselves as candidates at GM’s Annual Meeting each year.   My guess from looking at the votes and the BROKER_NON_VOTES (BNV) is that these gentlemen only vote for themselves as the difference in the BNV is the difference in their votes for.

Putting that anomaly aside there were 396 companies that had one or more directors whose proportion of  AWA votes was 20% or more.  182 companies had more than one director with a negative vote total grater than 20%.  News Corp led the pack as each of the 12 directors put forth by the company received a negative vote of 20% or more.   Here is a list of companies that had 5 or more directors up for election that received a 20% or greater negative vote.


Because we can sort on Gender I discovered that 90/851 recipients of 20% or greater AWA votes were women.  This approximates their representation in the overall collection of directors (18,193 total directors for the 2,959 companies and of the total 2,576 were women candidates).