Director Relationship Data

I’ve always thought those relationship sketches that show connections between companies and people are really cool. One of the challenges using that information in an academic study has been that matching the names of people and companies to actual concrete data can be daunting. I hinted back in July that we would start addressing that problem. As usual – we ran into more roadblocks than expected and our efforts were complicated by some unexpected problems in rolling out to Appstream. I am happy to report that this data went live this morning – there is a new Data Table available from our application named DIRECTOR_RELATIONSHIP. To access the data create a request file in the usual manner and look for the new entry at the bottom of the Data Tables user control on our application as illustrated in the image below:

ExtractionPreprocessed User Interface with DIRECTOR_RELATIONSHIP Available

One of the complications in creating this artifact was to decide how to pick a span of time to use for reporting. The final span we decided on is not perfect but is at least a reasonable starting point – this data is organized by reporting calendar year. During calendar year 2020 (1/1/2020 – 12/31/2020) we identified all DC tables that were filed and isolated those that reported DC data for FYE 2019 or 2020. We identified all directors reported in the DC tables where we were able to match a PERSON-CIK to the directors. We then looked for all other DC tables filed in the 2020 calendar year that included that director.

The data is organized by ISSUER CIK/Calendar Year. So for example if you want to explore the 2020 relationship data for Apple Inc (CIK 320193) your request file should have the value 2020 in the YEAR column. The results will list all of the directors reported in the DC data that Apple filed in 2020 as well as their PERSON-CIK and SEC-NAME (if available) as well as their GENDER and their AGE and tenure (SINCE) as it was reported in the Apple filing. The remaining columns will list the CIK (OTHER_ISSUER_CIK_#) of any other issuer(s) that included the director in their DC data as well as their tenure with the other registrant (OTHER_ISSUER_SINCE_#). Here is a screenshot of this data for APPLE INC:

DIRECTOR_RELATIONSHIP data view with focus on Apple Inc (CIK 320193)

If you look carefully at the table you will see that Mr. Bell is affiliated with four other registrants – note – those affiliations reflect the fact that he was included in the DC table for those registrants during the 2020 calendar year. It turns out that he retired from 2 of those boards in 2019/2020. The interesting observation in this is that the biography of the directors is forward looking – our data is concurrent.

We are still looking at ways to improve this data. For example – you might note the missing data for BRANDON PILOT ( in the first row of the image). He evidently does not have any SEC reporting responsibility and the registrant he is affiliated with is a Smaller Reporting Company – it takes longer to update data on these registrants. I will note that the DIRECTOR_RELATIONSHIP data for 2020 covers 31,386 PEOPLE-COMPANY observations. We have not been able to identify a PERSON-CIK for only 675 of those observations. Some of those will come later – but others will never show up because the director might never have an SEC filing obligation. Usually that occurs when they are non-US domiciled, the issuer never offers equity to directors or the issuer is newly registered and the director steps down around the time of the IPO.

I am going to extend this post to provide a concrete example for our decision to anchor on PERSON-CIK as a critical identifier. The image below has data for a Mr. Hollis who serves as Chairman of Hain Celestial (CIK: 910406) and as an independent director for SunOpta Inc. (CIK: 351834).

R Dean Hollis data from directEDGAR’s relationship tables.

As you can see, Hain Celestial reports his name in the director compensation table as R. Dean Hollis, SunOpta reports his name as Dean Hollis. If we were trying to match across these firms – this small variation would require some contortions and more likely than not some review. By providing the PERSON-CIK we are able to provide more certainty to this process.

You might also notice the age difference in the two rows. SunOpta is a 1/30 filer and their proxy was filed on 5/1/2020. Sometime between then and 10/13/2020 (the filing date of Hain’s DEF 14A) Mr. Hollis celebrated his 60th birthday.

Finally, our initial push of this data covers calendar years 2009 to 2020.


A number of public companies have made announcements about adjustments to compensation because of the effect of the pandemic on revenues and/or income. In the EC and DC tables that we have seen the adjustments have generally been reflected in the Compensation Discussion and Analysis section of the DEF 14A or the 10-K. However we have had one registrant report the adjustment in the body of a table. Below is an image of the DC table reported in NET 1 UEPS’s proxy (CIK 104514):

Director Compensation Table as Reported by NET 1 UEPS

Because the adjustment has been so clearly labeled we have decided to include a new column in the DC/EC data when the registrant makes this disclosure. If you download DC data for this company you will see this column in the output.

Interestingly – they did not use the same practice when they prepared their EC table. Rather than reporting the adjustment in the table they described that “The 2020 amounts presented for Messrs. Kotz√©, Pillay and Smith are after the agreed COVID-19 salary reduction under which we donated 30% of their fourth quarter base salary to initiatives fighting against the COVID-19 pandemic in South Africa.” In these cases we are not making any additional disclosure or adjustment to the data.

Interestingly, Rave Restaurant Group (CIK 718332) reported that their directors waived fees for the 3rd and 4th quarter of 2020. Since they provided no direct explanation of the motivation for the waiver we made the tough decision to net the fees waived against OTHER (RAVE DEF 14A).

Rave Restaurant Group 2020 DEF 14A

We are just a month or so before we begin seeing a crush of filings for the 12/31 registrants. It will be interesting to see how they choose to reflect any compensation changes made as a result of the pandemic. As long as they explicitly label adjustments as COVID related we will normalize them into the COVIDADJ column.

Updates: Y2021 Filings Added, LibreOffice Added to Platform, Beginning to wonder if we should reorganize our indexes.

If you do an Index Update on the platform you will see that we have added 2021 filings. At the moment our updates to 2021 are going to take place early Saturday morning (around 4:00 AM UTC-6 hours). While I suppose it would give us bragging rights to accelerate that I am not sure there is a significant benefit to moving to more frequent updates. If anyone has a strong belief that would increase their productivity please reach out so we can discuss.

We installed LibreOffice so you can open and manipulate CSV files on the instance rather than having to transfer them back to your desktop just to open them. I did a Search and then Summary Extraction and played around some with the file – it works. The interface/menus are very similar to all Windows applications so I found it useful – a little bit different. When I did the install I added German, French, and Chinese language packages so if you prefer to work with the menu using one of those languages you should have that ability. If we need another language pack please let me know.

Finally, I am starting to wonder about reorganizing the indexes. I don’t think I want to have that conversation in this forum – once we get some time to take a breath I will be sending out an email to users sharing my thoughts and asking for your feedback. I think there are some cool things we can do that would improve your productivity and the nature of your searches.

Data/File transfer to/from directEDGAR

We have had several questions about using files and CIK lists with directEDGAR. I prepared a short video that illustrates the process of moving data that is in the Clipboard as well as files to/from your session. The video illustrates the transfer of a list of CIKs to use in a search. When the search is finished we need to copy the missing CIKs from the search back to our local computer and then we want to extract the actual documents and save those locally. All of this is illustrated in this video.

As a side note – we use a service from a company named Rev to create closed captions for our videos. There can sometimes be a lag as long as 24 hours between our video release and the addition of the captions. They do a much better job with the captions than the free services so we appreciate your patience during the caption process.

Spreadsheet Tool and Python?

I was sitting on my laurels this morning watching activity through the new APPSTREAM instances of directEDGAR when I received two separate emails about improving the experience. The first one was wondering if we could install Python and the second was a request for a spreadsheet program to more readily review CSV artifacts that are created using the platform.

I should have thought of adding a program to more naturally open csv files. I apologize – I had tunnel vision the last few weeks as I was much more worried about the config files and managing disk permissions to give you fuller access to the archive. I will be adding an open source spreadsheet program before the week is out. We can’t add Office unless we buy a use license for each user and I would not know how to budget for that.

The notion of adding Python is very intriguing and it is also seems possible. You have read access to all of the SEC filings in directEDGAR.

Appstream session with directEDGAR archive 10-K filing directory for CIK 1750 FYE 5/31/2019 open in session.

The basic work flow with our platform is to first Search for relevant documents – and while we have great tools to assist your Extraction and Normalization of content there are plenty of use cases where you might want to use Python to achieve a more finer grained Extraction and Normalization process than our tools offer.

I was initially imagining you would use the DocumentExtraction feature to access specific documents – compress them – move them locally and then run your own code. But the more I think about the argument/suggestion/comment I received this morning the more I understand the value of this. We are going to look into this and see what is necessary. I believe we can do this – the challenge will be to find the right compromise on the version and installed libraries. Finally Available – Cloud Release!

I had hoped to push our latest version on Appstream out before Christmas (and we hope you had a Merry Christmas) but there was an unexpected technical problem that took some time to sort out. However – it is set to go live tomorrow 1/4/2021.

The most important feature enhancement is the development work we did to allow you to interact with new Fields that we add to document metadata. All of tools that allowed you to manipulate or use field values in our previous generation tools had the field names hard coded into the application. The application infrastructure was redesigned so that the fields available in a document set are identified when the indexes are recognized by the application. In a practical sense – this means that as we learn about the value or possibility of new fields to use to help filter documents – we can add those fields without having to re-code the application. To see the difference compare the two images below – the first one has the fields list for SC13D filings that were accessed using

SC13D fields accessed using

The next image show access to the fields that are available when using the latest version of the application

SC13D fields now available using

You can search/filter on field values – but even if you don’t the field values will be included in your SummaryExtraction and/or ContextExtraction csv output files. It would be hard to create a meaningful search pattern for ACCEPTANCE but if we wanted to identify all SC-13D filings made by Jana Partners then it is as straightforward as typing JANA* in the Value to Search box as illustrated in the next image.

Setting search of SC13D filings for those associated with Jana Partners as the investor.
Search results for for SC13D filings where Jana Partners are listed as Investor and words rooted on undervalu*

Even though we did not try to use the ACCEPTANCE filed in our search the output will include that value (suppose you wanted to do an event study using the initial filing of an SC 13D filings as the event date).

While the ability to provide additional metadata access to you is probably the most consequential feature of this update there are some other enhancements that I hope improve your experience. One is that we have added a new Zoom button that allows you to more easily navigate, identify and select the documents that you might want to search.

Enhanced view of Index Library

Version is set for release through our APPSTREAM instances at 8:00 AM on 1/4. We will stop all running instances and change the application association that morning. Those of you who have been beta testers – thank you – you should receive an email describing your new license validation step. Those of you who have not been part of the beta testing group – one or more of our contacts at your university will be receiving an email with access instructions – if you need help please contact us at

SC 13D Filings

Today we are releasing SC 13D and SC 13D/A filings on the new platform. The SC 13D is required when a person acquires 5% or more of the shares of an entity that has an SEC filing obligation and they have some intention of using their holdings to influence the issuer in one of the ways listed in the SC 13D form.

One of the features that distinguish SC 13D (and G) filings from most other filings is that there are SUBJECT-COMPANY and FILED-BY tags to identify the Investee and Investor entities.

We were asked to make these filings available across the platform and it seemed only reasonable to make sure we sorted out a way to help you identify both entities involved. Thus we created new tags to apply to these filings; INVESTOR_CIK and INVESTOR_NAME. Thus we organized these filings by issuer CIK (the Investee company CIK) and then tagged them with the CIK and NAME of the Investor who filed the SC 13D. The way our system works – if you do any extraction of text or a summary extraction the results will automatically include these values as illustrated in the next image.

Investor Metadata Retrieved through directEDGAR

While that is exciting – what happens if we have a list of Activist shareholders and we need to to identify the Investees/Targets of their activity. Specifically we want to limit all results to a specific set of Investors! We could just build a search based on the INVESTOR_CIK. I have created an example below – there were 122 activist in my list and I simply wanted to identify the initial SC 13D (not amendments). (Note to make the search construction less of a focus I only included the first 22 INVESTOR_CIK parameters).

 (INVESTOR_CIK contains(807249)) or
 (INVESTOR_CIK contains(70858)) or
 (INVESTOR_CIK contains(1048703)) or
 (INVESTOR_CIK contains(921669)) or
 (INVESTOR_CIK contains(1055951)) or
 (INVESTOR_CIK contains(1517137)) or
 (INVESTOR_CIK contains(1113303)) or
 (INVESTOR_CIK contains(1504304)) or
 (INVESTOR_CIK contains(1312548)) or
 (INVESTOR_CIK contains(904495)) or
 (INVESTOR_CIK contains(1510281)) or
 (INVESTOR_CIK contains(1418812)) or
 (INVESTOR_CIK contains(923666)) or
 (INVESTOR_CIK contains(1495741)) or
 (INVESTOR_CIK contains(1462180)) or
 (INVESTOR_CIK contains(1352851)) or
 (INVESTOR_CIK contains(72971)) or
 (INVESTOR_CIK contains(1444376)) or
 (INVESTOR_CIK contains(1520631)) or
 (INVESTOR_CIK contains(1159159)) or
 (INVESTOR_CIK contains(1001085)) or
 (INVESTOR_CIK contains(1346543))
and undervalu*
and (DOCTYPE contains (SC13D))
Results of Search of 13D with focus on Specific Investors

The SC13D filings will be available through our platform later today (12/6/2020). Remember to use the OPTIONS/Index Update feature to get access. The metadata is included in the indexed filings already but will not be available in the application until we update the APPSTREAM version sometime in the next week or ten days. Our target is 12/14 but this has been an unusual year so please be patient. I will announce the rollout here.

SpecifiedTables Extraction Feature

We are working on updating our archive of Director Votes data and as I was training our newest intern (Patrick Kealey) on the process I realized I had never posted here about the SpecifiedTables Extraction tool.

Those of you who have used our Table Snipper know that it relies on the existence of uniform but unique language in a table. For example more than 90% of Executive Compensation tables have the words Salary, Year, Position and Total. While there are other tables that have some of those words there are very few other tables that have all of those words. Thus identification and extraction of that table is pretty straightforward.

The results of shareholder votes are reported in 8-K filings with the ITEM_CODE 5.07. So it is relatively easy to identify the actual source documents that summarize the votes. However, the table that describes the results of the votes cast for director election tend to fall into two categories. In our testing we have learned that about 30% of the tables include some form of the word NOMINEE and – in more than 99% of the tables that contain NOMINEE – the table is reporting the results of the election of directors. Here is an example from Pepsico (CIK 77476).

Pepsico Election Results

If the word Nominee is used in the table – the base TableSnipper makes it a trivial exercise to extract those tables. The other 70% are more problematic because they actually do not have any language that defines the column with the names of the candidates. For example, here is the summary Apple reported in 2019

Apple Election Results

We can’t use any of the other column headings to identify this table because every other proposal that was submitted to a vote uses those same column headings as illustrated in this image:

Apple Shareholder Vote Results

Since we can’t use specific language to identify a large number of tables the only alternative is to take a modified approach using the SpecifiedTables feature of the Search Extraction & Normalization Engine. Step 1 is to identify the relevant documents that have the data we need to collect using the Search feature. Step 2 is to do a SummaryExtraction which generates a CSV file that has the metadata about the documents returned from the search. One of the columns in the csv file is the FILENAME – this column has the full path to the actual source document. We generally retain the CIK, CNAME and FILENAME columns and delete all other columns. We add a new column – our practice is to name it DataValue – but it can be named anything you like.

Example Data Collection Worksheet

Remember – the listing in the SummaryExtraction file matches the order of the search results in the application. There is a one-to-one correspondence between the items listed in the SummaryExtraction file and the items listed in the application.

Search Results Listing

Now we need to review the documents and identify one value from each document that is contained in the director votes table. While this might seem tedious – the alternative is not so pretty. The fact that the SummaryExtraction file aligns with the listing of search results makes this task easy to describe to others. The requirement that they only have to capture one value from the table and transcribe it into the csv file makes it very manageable. Some of the search results do not report on director vote results – we leave those rows blank – to delete at the end. Here are the results of the transcription.

Transcription of Data from Search to Summary

Once the transcription is complete – delete all of the rows that report search results that did not have relevant data and save the file. Then use the Extraction\ExtractionPreprocessed tool on the application to select the csv file and specify the column that contains the data value that you want the application to use to identify the relevant table.

SpecifiedTables Feature of Application

When all of the parameters have been specified, select the Okay button. The application will use the FILENAME to access the relevant documents and find and extract the relevant tables based on the values specified in this request like file. If there are multiple tables with the same value specified in the request file – each of the tables will be extracted and labeled uniquely. The labeling follows the same pattern of all artifact labeling we use in directEDGAR to create an audit trail back to the source document. Here is Apple’s director votes table after we completed the process.

Apple’s Director Votes Table

Once the tables have been extracted we can then use the Dehydrator/Rehydration process to normalize the votes. Here is part of the output after Patrick finished normalizing the output:

Apple Director Votes after Dehydration/Rehydration

Obviously you are probably not going to want to invest effort into collecting Director Vote data since we are making that accessible through the platform. (Our data will include PERSON-CIK, TENURE and GENDER!). However – this process is the same process you should use when you are trying to collect data from tables that cannot be uniquely identified by some common words. We are often asked about collecting non-GAAP earning reconciliations. That project is not on our list at the moment but this process would make that very approachable and manageable.

Using the date search capability

I received an email this morning asking a really interesting question – how can I use directEDGAR to identify the auditor and the location of the auditor’s office in 10-Ks filed before 2000? This data object is not readily available from any source that I am aware of.

Step 1 of the process was to go look at some 10-K filings – I used the following search (CNAME contains(CONAGRA)) and (DOCTYPE contains(10K*)) – I simply wanted to review how this disclosure was made in Conagra’s 10-K. I selected Conagra because I have had many students take an internship and so their name came to mind first. Here is what the disclosure looked like:

Conagra 10-K Deloitte signature.

So our client is looking to capture the name of the auditor and the location of their office – my immediate thought was that I could search for auditors by name, the name of the states and require that the auditor, state name and a date be in close proximity to one another. If the search was successful I would then extract the context. I had to go do some research to identify the name of the audit firms that existed during the span of time they are trying to collect this data for. This list is not meant to be exhaustive but as a starting point I came up with these auditors: (anderson or ernst or kpmg or waterhouse or pricewaterhousecoopers or coopers or pwc or deloitte or bdo or mcgladrey or grant or baker or crowe). Given that the audit report spans from 1/1/1995 to 12/31/2001 – I need a date search parameter – date(1/1/1995 to 12/31/2001). The magic here is that our index parser recognizes dates in US form. I also need to set the search to focus on 10-K filings as well as Exhibit 13 filings. Sometimes the audit report is included in the Exhibit 13 rather than the body of the 10-K.

Here is the search string I put together for my first stab at collecting this data:

date(1/1/1995 to 12/31/2001)

(anderson or
ernst or
kpmg or (. . . more auditor names)


(Alabama or
Alaska or
Arizona or (. . . more state/location names)


(DOCTYPE contains(10k*))
(DOCTYPE contains(EX13))


There are basically four parts to this search. First, the date span, then auditor names and the state name. These first 3 parts need to be grouped together since we have set some proximity parameters for these particular items. I then have the document restrictions. I need the document restrictions so I don’t find the content in a consent filing.

I want to keep a fairly tight context for the extraction so I set the Context option span to be 5 words (before to after). In my initial naive pass the search identified 41,567 documents. An example of the extracted context is here:

accepted accounting principles. /s/ Arthur Anderson LLP – —————————— Boise, Idaho. February 2, 1998 <PAGE> UNAUDITED RESULTS OF QUARTERLY¬†
ContextExtraction from Date/Auditor Search

Clearly I need to make some improvements. I need to identify other names for auditors. Further, some state names may be abbreviated in some filings or the auditor may be domiciled in another country so I need to play with adding state abbreviations, names of countries or large international cities where I expect to find results. But the hard work is done – now we just need to experiment – identify auditors, places and setting the right distance between the date/auditor/location parameters.

The context extraction has an identifier for the critical values (name of state and the auditor name) and it has the actual context. I deleted a lot of the columns to focus on the relevant context for this particular example.

accepted accounting principles. /s/ Arthur Anderson LLP – —————————— Boise, Idaho. February 2, 1998 <PAGE> UNAUDITED RESULTS OF QUARTERLY 11

The next step is to parse out Boise – but this should not be difficult in Excel. It would be even easier in Python – but it is definitely doable in Excel.

I told the client – this was one of the most interesting searches I have performed in some time. As an after note – I built the search in Notepad++ in my first draft I had the parentheses wrong – using Notepad++ it was easy to keep track of the grouping.

Two New Filings Types Now Available

One of the critical benefits of our new platform is the ease in which we can distribute/add new filings to the search repository. Historically we have supported research projects when one of our clients has asked for a special directEDGAR repository to be created for a specific project. Off all of the filings available on EDGAR the one type that has been asked for most often has been S-1 filings. When we have created these I have been reluctant to push them out to everyone because of the level of coordination it would have required with your IT support. All of the coordination issues go away with our new platform.

I had a client reach out to me last week and asked if we could make S-1 filings and DRS (Draft Registration Statements) filings available to them. We already had an archive of S-1 filings so I just needed to update the archive and transfer it to our new platform. The DRS filings were ones I was not familiar with but they look particularly interesting for research into IPOs as well as governance. A simple description of the DRS filings is that they are S-1s confidentially filed while the registrant is sorting out the IPO process. We are including letters from the registrants in response to SEC comments in the DRS folder with the DOCTYPE tag DRSLETTER. Note – the original SEC Comment letters that prompt the response from the registrant have been available in the UPLOAD folder as they have been released. Since the DRS filings have only been available since Q4 2012 there are not that many of them. Both sets of filings are now available on our new platform.

If you are using our new platform – remember that your instance is loaded with your preferences and we maintain your configuration settings in the cloud. Thus you have to update the index library to have access. The process is pretty straightforward and is described in the Help under Indexing the specific topic is labeled Index Updates.

Help Content Indexing\Index Updates

Once you have updated the index library the S1MASTER and DRSMASTER indexes are available for use. You can then use our amazing search as well as the full range of Extraction and Normalization tools on these filings. In the image below I ran a search to identify all DRLETTER (DRSLTR) type documents.

Search for DRS LETTER documents using the DOCTYPE search field