Institutional Trading – the “Whales”

We often are asked to provide access to the 13F-HR reports filed by institutional managers.  While we have provided these filings optimized for our platform – ultimately what users want is the data organized in a way that is meaningful.  This has something I personally have wanted to do for a long time.  However, the problem has always been matching the name of the issuer to some more useful identifier.  The actual 13F filings list the issuer name and their CUSIP (Committee on Uniform Security Identification Procedures) assigned identifier.  We needed a way to map the CUSIP back to the Central-Index-Key (CIK) assigned by the SEC.  Early in our  life I approached CUSIP Global Services ( a division of S&P) about licensing the CUSIP so we could link filings to CUSIPs to CIKs.  The cost was prohibitive so we have been stuck there.

Recently we discovered another way to map the CUSIP to CIK and after extensive testing are confident in the mappings we generate.  Because of this we have started processing the 13F-HR reports.

What we are doing is aggregating all of the data by report quarter and issuer.  The SEC requires institutional investors with more than $100 million in securities under management to disclose their holdings in the 13F within 45 calendar days after the end of the quarter.  We are parsing all filings made within each window and then combining the data for each individual issuer from all of the filings into one single report.  So a report for say Conagra for the 4th quarter of 2018 will be available soon after February 15, 2019 (the deadline for the report).  Of course we expect to have to periodically update these summary reports when amendments are filed by the managers.

Our platform already provides access to the beneficial ownership data as reported in the DEF 14A/10-K.  The institutional ownership data complements the beneficial ownership table because the beneficial ownership table only contains details about owners of more than 10% of the equity and the ownership of directors and officers.  The beneficial owners of 10% are a subset of the institutional ownership reported in the proxy.  For example, CAG’s DEF 14A reports only two beneficial owners (other than management) Blackrock and Vanguard with a total of 74 million shares.  Our analysis of the 13F filings shows total holdings by institutions to be approximately 305 million shares for the 6/30 quarter.  This is almost a 4 fold increase and it also represents more than 75% of the approximately 390 million shares outstanding as of the end of June 2018.

By combining the data from our beneficial ownership tables for directors and officers with the institutional ownership reports we are starting to generate our users will have better measures of  these important characteristics of the distribution of equity.

I think we should have a pilot of the institutional ownership data available before the middle of October.  You will know this data is available by starting the application and using the Extraction\Preprocessed feature.  When the pilot data is available there will be a new entry 13F_PILOT in the Data Tables listing.

If you would like to dive into one of the files while we are completing our final testing please send me an email and I will make one available (for our clients only).

 

Pilot Test of Ownership Data Over – The Real ‘Stuff’ Now Loading

On August 3rd we loaded a test run of ownership data.  Our initial focus was roughly the S&P 500 for the period from 2013 to 2017.  We had some feedback on the initial data load and made some changes to the way the columns are displayed and ordered.  We also added some additional fields (including the ACCESSION-NUMBER) of the filings so that you can more easily explore the original source file if you have questions about the data.

Friday we took down the initial data and started loading replacement data with new columns and more CIKs.  We have approximately 8,000 CIKs and data going back to 2008 in the loading process that is running right now.  This is massive (it represents normalized data from more than 1,500,000 ownership filings.  Again – the data is organized by ISSUER – YEAR.   The data in each ISSUER – YEAR file is sorted by RPTOWNERCIK – FILING DATE.

We have preserved all of the footnotes and their association with individual data items.  Each row represents either a NONDERIVATIVE TRANSACTION, NONDERIVATIVE HOLDING, DERIVATIVE TRANSACTION, DERIVATIVE HOLDING or a REMARK.  The datatype for the row is indicated by the value reported in the datatype column.

The full load may take the balance of this week.  We will then identify any missing CIKs and fill in the data back to 2003.

Form 3, 4 & 5 Filing Data on Line!

One of the key drivers of our new architecture was to allow us to more easily expand the range of data values we deliver through the Search, Extraction & Normalization Engine.  We are experiencing that benefit right now – we just started delivering  normalized data from Form 3, 4 & 5 filings.

The SEC lays out the obligations of officers, directors and other Section 16 persons to report their holdings and transactions in their company’s securities and also derivative instruments where some security of the company is the underlying value determinant.  The reporting mechanism that is used today is either Form 3 (initial report of holding); Form 4 (transactions and other events that affect holdings) and then Form 5 (annual statement of holdings).  In about May 2003 these forms were made machine readable when the SEC introduced/required them to be filed using both an HTML and XML format. As an aside prior to the introduction of the XML form – these forms were only available through EDGAR from a limited number of companies – most filers choose to file a paper copy.

So this morning we released a pilot test of normalized data from these forms.  This has been a big undertaking (today there are more than five million ownership forms available through EDGAR).  Our initial pilot is focused on the S&P 500 for the period from 2013 to 2017.

One of the big struggles about this data was deciding how to organize it for you to call.  What we decided to do was prepare the data by COMPANY-CIK /YEAR.  So if you submit a request for Abbott Labs (CIK 1800) for 2017 we will deliver back the data extracted from every Form 3/4 and 5 filed by any person who filed one of those forms during the period from 1/1/2017 through 12/31/2017.

Each line of the results represents one reporting event for one person.  So if a reporting person describes 3 non-derivative and 2 derivative transactions in one filing the result file will have 5 lines – each line reports all of the values included in the form.

To access this data create a request file (three columns, CIK, YEAR and PF) and from the Extraction menu select ExtractionPreprocessed.  Once the request file has been validated and you select the Read Input button the Pull section will populate with the latest list of available tables.  The parsed ownership data is delivered when you select SECTION_16_ANNUAL_SUMMARY

owner_1

There can be as many as 150 unique column headings in the result file – depending on the number of footnotes that are included for the transactions.  This is critical we attach any relevant footnote to the transaction the footnote is providing elaboration for.

I will be honest – our labeling for the footnotes is a bit tedious – but we think necessary to provide clarity as to what part of the form the footnote should be considered with.  For a little hint at these complications consider this image from a partial Form 4 filed by John Klinck an EVP for State Street (link to full form).

footnote

There are two footnotes to explain/elaborate on the value reported for the Amount of Shares Beneficially Owned Following Reported Transaction.  Note that this entry describes a non-derivative transaction.  These footnotes are indicated in the following manner – the text that is associated with the footnote with the (2) indicator is in a column labeled transactionshares.footnote.  The footnote indicated by the (3) value is labeled transactionshares.footnote_1.  In short we are adding an index value to all footnotes associated with a data entry after the first footnote for that particular data entry.  If an entry in a form has 4 footnotes then the last one listed will be indexed with a _3.  Footnotes are associated with a specific data entry and are thus keyed to that data value and are reported in the row that that data value is reported.

These forms allow the respondent to include REMARKS.  A remark applies to the entire form and so what we ended up doing is including these in a separate row.  Initially we thought to include them next to each transaction but decided that these might be more useful if they could be easily  isolated.  We include a column that describes the nature of the content for each row (datatype).  

ownership_datatype

This allows you to very quickly isolate and review any particular type of data.  All of the identifying information for each person associated with each form is included in each row.  There are indicators for the relationship between the reporting person and the issuer (isdirector/isofficer/isother).  The reporting person’s CIK is included so you can match back to our compensation data.

I am really excited about this update.  If you work with this data and have observations that would help us improve the utility please send me an email.  (burch [yada] directedgar.com).

 

 

FORM D & More About RDATE

Dropbox went public on March 23 2018.  However – their first public filing was made almost a decade earlier – on July 7, 2009 when they reported raising more than seven million dollars from eight investors.  Their initial SEC filing was made on a Form D – their trading name at the time was Evenflow.  They filed a second Form D in 2014 describing an offering of securities totaling 450 million – by the time they made the filing they had raised 325 million of that 450 million total expected.

These were the only EDGAR filings from Dropbox until their draft registration statements were made public on 2/23/2018 (the same date as their S-1).

We have had several requests for Form D filing data. So I am delighted to report that we are in the final stages of delivering all the data extracted from the Form D filings to our distribution server.  When the Form D data goes live a new entry will appear in the ExtractionPreprocessed Data Pull List.  To access the data all you will need is a standard request file with the CIK, YEAR and PF focus values.

Traditionally the YEAR value should be the year the EDGAR filing was made available.  Of course this raises an unexpected complication for our users because we would not expect you to have any prior knowledge of a particular filing year for a Form D.  Therefore – to make this easier we are initially going to load all Form D filings with an RDATE of 6/30/2018.  We will add an additional field in the data RDATE_2 – this will be the actual data the filing was made available through EDGAR.  So when you build the request filed for Form D data the YEAR column will only need to have 2018.  You will then receive all Form D data for each CIK in your request list.

The column headings are going to reflect the location of the data item in the original form.  Here is an image from one of Dropbox’s Form D.  The section in the image is the section of the filing that reports on the issuer.

formd_1.PNG

Here is an image of the data captured from this original form.  Please note that the data has been transposed for this post – when you access the data the column heading are the values in column A:

formd_issuer

Since each data item in this section corresponds to details about the Primary Issuer the phrase primaryissuer is concatenated to other parts of the name with a period (.).  Since there are other addresses in the form the label issueraddress is used to make clear that the address information relates to the primary issuer rather than another address detail in the form (for example an address component of a related person).

Our initial push of Form D data was more focused than we would have liked.  Honestly I am trying to set up the automated processes.  Thus, rather than filling in all of the more than 300,000 Form D filings we identified those filers that have ever filed a 10-K.  This significantly reduced the initial testing load to a few more than 14,000 original documents.  As of today (8/2/2018)  the parsed data from the issuer section and the offering details from these filings has been pushed to our distribution system.  Sometime in the next 12 or so hours your client will update with the new option:

formd_ep

As I noted above – we are artificially setting the year value for these files as 2018.  The objective is to prevent you from having to guess when your particular registrant made a Form D filing.  So your request file should look something like:

formd_request

Frankly, I was surprised by some of the filings.  Note that Abbott Labs (CIK:1800) is in the above list.  They have filed 3 Form Ds – but the amount of capital raised seems minuscule compared to their market capitalization or book value (one offering of $70,000).  I have noticed that some filers are using the capital to fund executive pension plans.  All of these details and others are available in the way we have made this data available for your research.

As I have described in our help materials and elsewhere we use the term RDATE to indicate when a filing was made available to users of EDGAR.  Note that I did not describe the date as a filing date – this was purposeful.  The filing date reflects the date a filing was either received by EDGAR or at an SEC filing office.  For most filings the RDATE will generally also be the filing date but there are enough discrepancies to pay attention and manage dates by the RDATE rather than the filing date.  While setting up the code for processing the Form D filings I observed instances where the difference between the filing date and the RDATE was substantial.  For example – here is an image from the  header page for a filing made by Diffusion Pharmaceuticals:

formd_rdate

Notice the Filing Date of 7/3/2006 but also notice the date next to the SEC_HEADER tag  – 6/15/2015.  This is the date the filing was made public through EDGAR despite the filing date of 10/9/2007.  If you wanted to do an event study (imagine wanting to determine how public company market values were affected by news of a successful funding round for a competitor) then the  6/15/2015 date is probably the correct date to use or perhaps provides evidence that this observation should be considered for deletion given the gap between the filing date and the RDATE.  We use the SEC-HEADER date to assign the RDATE to each filing we process.  If there is a value for PERIOD in the header we use that for the CDATE as the header tag PERIOD maps into the index file CONFORMED PERIOD OF REPORT.  If there is no PERIOD value we use the filing date for the CDATE.  This is particularly important for the comment letters (form type UPLOAD)

 

Augmentation of Compensation Data – Executive and Director Tenure, Age and Affiliations

Now that we are through the hurdle of our last software release we have shifted our focus to data enhancements.   This has been fascinating as we have worked on developing processes to add additional values to our existing data collection tables.

Our initial effort has been focused on adding AGE, TENURE and OTHER_DIRECTORSHIPS to the director compensation data.  Our focus for OTHER_DIRECTORSHIPS is on other public companies.  Specifically we are using our deep collection of director compensation data to match directors across registrants – when a director is included in another entities compensation table we are adding the CIK of the other entity as a field in their compensation data.

Today was the first chance we have had to take a look at the data and it is interesting.  Our initial universe was limited to the 4,045 registrants that have reported director compensation since January 1, 2018.

Out of 4,045 in the initial group – 2,920 registrants have at least one director that is also a director for another company in that group of 4,045.  1,874 registrants have at least one director that has at least two other directorships.  807 of our 4,045 have at least one director that has at least three other directorships.  287 of our sample have at least one director with four or more other directorships.

There are some interesting research questions that should be considered about these companies that have busier directors than others.   By the same token – what about those companies that do not have any directors with current positions at other public companies?

With respect to the directors – the breakdown is as follows:

directorships

I of course had to do a Google search for details about the person who has nine total directorships (that we have found so far) in 2017.  He is involved with an interesting group of companies – his total reported compensation for these 9 registrants was 1,383,896.

But of course it is never that easy.  When I sorted the data to align this mystery director’s compensation I noticed that he had two rows of identical data – that is each component of compensation matched across these two different CIKs. With a little more research I learned that these two CIKs had a parent-sub relationship.  One of the entities has traded common stock and the other has public debt.  So for practical purposes our mystery director only has eight unique directorships – not the nine indicated by the CIK distribution.

To give you a teaser for the presentation of this new data I have included an image from part of Abbott Labs (CIK:1800) data with these enhancements.  The columns circled in red are new.

abbott

 

Finally 4.04 Release in Sight

When I set out to accomplish some home improvement project my wife and my amazing journeyman helper (my son) ask me how long I think it will take.  I set some number of hours, days or weeks and invariably it takes two to three times longer than planned.  I of-course, can keep my cheer during the process by my lovely bride sometimes gets annoyed because the disruption and mess are lasting longer than planned and my helper gets frustrated because after the initial rush of excitement he would probably prefer hanging out with other 13 year old boys (maybe girls too – not sure) rather than grinding away with his doting elderly dad.

The same happens when we plan a new release for directEDGAR.  We can easily see the start and the finish lines.  However, we never really understand how complex the terrain is until we are in the middle of the project.  I was hoping to release Version 4.04 by July and then it got pushed back until January and we are finally near the end.  It should be released in the next two weeks.

So what is new?

  • Improved the speed of access to our artifacts by a factor of 50 or more.  If you try to download say 5,000 or so Executive Compensation tables or MDA Snippets it takes upwards of 30 minutes  to an hour.  Now the same task will take less than one minute!  To get the stats for this bogpost I did a test to extract MDA for 5,000 CIK-YEAR pairs – using the old interface it took 42 minutes, the new interface took 23 seconds!!
  • We tweaked the Context Extraction to ignore the fields that were used in the search – fields are used only to focus the search and all fields are automatically included in every context or summary output.  This reduces the bulk of the Context Extraction so you don’t have to filter the csv file after the search is extracted to remove the irrelevant context lines.
  • We added a feature so you can generate a CSV file that contains all the meta-data about directEDGAR files and artifacts in a specific folder.  This is for those cases when you access some artifact or copy files from the main repository to another location and you need details about the files and the filers.  The new feature allows you to select a directory, all the files in the directory will be listed and we also parse the file names to give you the CIK, RDATE etc of the artifacts.  If the files is not a directEDGAR artifact we only provide the file name.
  • We made the DateFilter persistent.  This was probably a bad design choice in the beginning – because the date filter settings are ‘hidden’ after set we decided in the previous version to always reset the date filter to the default (no dates selected) after each search.  Some users have expressed a preference to have the filter persist across searches so we sorted out how to let you know the date filter is set for each search – you can clear it if needed.
  • We improved the way you set values for some of the meta-data filters for your searches.  Now those can be set as you select the filter item from the list rather than having to force you to go back into the search box, find the open parenthesis, set your cursor and then type.
  • We made some bug fixes including
    • Making sure the SmartBrowser opens in the right directory when you finish some process that causes the SmartBrowser to open.
    • When there is a problem with an input file (missing column or perhaps the file is still open and in-use) we let you know and give you the opportunity to close the file before moving forward rather than reporting an error that causes you to shut-down.
    • Making sure you can stop a search or other process that is running when you hit the Stop button.

The best thing about this release is the improvement to the artifact access.  When we started making director and executive compensation tables available directly we had imagined several hundred thousand artifacts and now we are over 2.5 million and adding more than one hundred thousand per month.  Our system was not designed for this.  We had a lot more to learn about how to manage the delivery than we imagined when we started.  However, there are some amazing folks working behind the scenes and they developed an infrastructure that will allow for a considerable amount of growth.  This is great because we are going to add new artifacts with the new infrastructure.  We have a large number of ideas we are waiting to implement until this release is pushed out to our customers.

Yes, we sometimes modify reported compensation!

Our system is designed to validate many characteristics of the director and executive compensation tables we extract and normalize.  One of the validation steps is to confirm that the reported total matches the sum of the components with an acceptable difference set at the absolute value of $10,000.

While it is easy to identify the reason for many of the exceptions (table reports 123.456 – value should be 123,456) others require more analysis and then ultimately some judgement.  I was working today and our system reported a $33,538 difference between the sum of the components and the reported total for CIK 1074902.  Here is the original table:

sumerror

 

The reported total for Mr. Meilstrup in 2016 is $215,199, the sum of the components is $248,737, a difference of $33,538.

We reviewed the document and the table and wondered if the difference could be explained by the repetition of $45,793 in the NONEQUITY and NQDEFCOMP columns.  This seemed particularly likely since the amount reported for NONEQUITY in 2016 was more than three times the amount reported for the other officers of similar rank.  Thus we substituted the $12,255 amount reported in the 2016 column for the other Executive Vice Presidents and the difference between the components and the sum of the components dropped to $0.

Our final data push for this bank reflected the change we made

sumerror_fixed