Using directEDGAR in the classroom – Pivot Tables and Gender Diversity

(Note – there is a sample data file at the end so you can use it for class)

Our director of data products (Manish Pokhrel) shared this NY Times article with me this morning Diversity Push Barely Budges Boards to 12.5%, Survey Finds. Manish pushed this to me for three reasons. First, we have discussed adding some measure of ethnic diversity to our director compensation data. However, we have hesitated because of concerns that we could make a mistake by relying strictly on pictures. Second, the study was completed by one of our clients – though I am not making any claims about how much of our data was actually used in this study.

The third reason is that I have mentioned before that directEDGAR can be useful in the classroom to teach skills and to provoke discussion about important business issues. Most business schools have classes where students learn how to use some of the more advanced features of Excel – including the use of pivot tables.

When I read the article it seemed to me that given the attention to issues of diversity that creating an opportunity for students to muck around with at least one dimension of this issue while improving their data skills would be a winner.

I decided to create a data file for this analysis – but rather than focus on the largest 3,000 US companies I decided to start with the largest 500 as of June 30, 2020. There is such a diversity in size from the largest to the 3000th largest I am not sure the comparisons are meaningful. To provide some context – the largest company by market cap was/is Apple which had a market cap of over one trillion dollars – the 3,000th largest had a market cap of 75 million. The 500th largest had a market cap of about 8 billion.

I started with the 500 largest and created a request file to pull the most recent DC data. For most of the firms the most recent was 2019 – however there were a number that have reported FY 2020. The most recent data was from Cintas who filed today. And then I pulled the data for these firms from five years ago. So if their most recent DC data was 2020 – I pulled 2015 data. If their most recent data was 2019 I pulled 2014 data.

I lost 37 registrants because they did not have DC data available five years earlier (GoDaddy reported 2019 DC data in 2020 but their first year of DC data was reported for their 2015 FY).

The results are interesting. The highlights include the fact that 39 did not have any female directors in the first year of this analysis. Only one did not have any for the final year – but there is a caveat. The lone holdout was Liberty Broadband Corp – they appointed or nominated a woman who was appointed to the board in early 2020. However there were no female directors for 2019.

There were 905 women out of a total of 3,721 non-executive directors in the first year of analysis (2014/2015). However this distribution changed to 1,415 out of 3,458 in the final year.

There were 56 registrants where the proportion of female directors decline across the five year span. However, I hesitate to draw any inferences about the declines without a close review of the facts. For example – Howmet Aerospace (CIK 4281) had 4/11 female directors in 2014. Only 1/12 in 2019. However – they were involved in a complicated corporate restructuring (Alcoa became Alconic became Howmet Aerospace). The primary aluminum business was spun off to a new entity (named Alcoa) and many of the women leaders became directors of the new entity. Finally, Howmet shareholders elected 3 women to the board at the 2020 annual meeting.

I apologize, I got a little deeper into the weeds with this post than I intended. The bottom line is that this is an interesting issue and I intend to show my students how to organize this data using a pivot table. We can then have a discussion about what this means for them. Should students use evidence of a company’s diversity when they evaluate if they should accept a position with the company? It certainly gives them more interesting questions to ask in the interview – can you explain why your board composition does not reflect the community?

If you are a directEDGAR client – you can pull the data yourself. However, if you don’t want to or you are just interested in mucking around with this data – I have made available an xlsx file with this data summarized. I included the Summary – I will however, remove the Summary worksheet before I pass it to my students to create the pivot tables. Since the data includes the compensation as well as the SIC of the companies it is possible to create some different cuts/tables of the data.

Legal stuff – this data may only be used for non-commercial purposes. We make no warranty regarding its fitness for any particular purpose. We hold a copyright on this data and hope you will respect it. Here is the link to the Excel file (Director Compensation file)

Testing new field in DC data

As I have described before we have been working to identify those cases where registrants have gone through some type of reorganization that has led to the creation of a new entity that has the successor filing obligations of the original entity.

You have had the ability to use the mapping we have created when you use CIK filtering for tasks with directEDGAR. So if you have a file that has Alphabet in the sample (CIK 1652044) but you are requesting data for a time period when Google was the reporting entity (prior to 2016) our application would return data for both CIKs.

The problem was that you would not know why data for CIK 1288776 was included in the results (of course you would if your sample had one CIK but not if you had hundreds or thousands of CIKs).

To address this problem we are going to add a new field(s) to our data ALT_CIK_#. In English that would read as Alternate CIK #. In most cases there is only one alternate CIK (Google->Alphabet, Oracle -> Oracle). There are cases though where there are several (CIK 23498 -> CIK 1636023 -> CIK 1732845).

We are testing this now and will roll out comprehensively when we have finished the cloud shift. However, at times you will see this new field in some data you extract from our preprocessed core data. You will also see ALT_CIK as a new metadata field in the DEF14 search extractions on the cloud platform.

To illustrate this – suppose I am trying to access director compensation data and I create a request file using the CIK for Alphabet – I know that director compensation data was available beginning in 2007 and so the request file looks like the following:

Request file using just Alphabet’s CIK

So after I have created the request file I use the application to select it and to select the Include Historical CIKs checkbox as illustrated in the next image:

Using directEDGAR to extract Director Compensation data

Once the inputs have been selected hit the Okay button and the application will work the magic. The results file will have all of the usual data as well as this new field – in the image below I hid the data (CASH/STOCK etc) to make it easier to see the CIK (the value associated with the filing) and the ALT_CIK_1 field.

Results for Director Compensation with ALT_CIK field

I will observe that one thought was to replace the CIK with the successor CIK. I don’t want to do that because you could not audit/trace back to the source document the data came from. Further, I can imagine there will be cases where it is significant to control for this shift. We will be pushing this out throughout the platform as soon as we can.

We’re getting closer

Using directEDGAR in my browser

I’m a bit giddy with excitement – the video above illustrated identifying all 8-Ks filed as earnings announcements for a sample of approximately 2,800 registrants. An interesting number of them (555) did not have an earnings announcement filed in the span I was looking. My output included summary details about the filings and I was able to save those who did not have an 8-K filings as a separate list for review.

Those of you who use directEDGAR already are familiar with the ease of this task. I was talking to a client yesterday who recounted the misery of trying to identify these by using code and the SEC website when he was a PhD student. There have been more than 315,000 8-K filings made since 1/1/2016 (my time period) and for their study they had to inspect each one to determine its relevance. I need to ask him how long that took.

In addition – sometimes you need access that is more native to your work environment rather than dealing with the limitations of a cloud experience. So we are also providing a native application mode that will more closely match the experience of working off your desktop. We’ve got that handled as well. The following video illustrates using our application to collect the data where the registrant uses language such as Our audit committee met N times blah blah blah.

Native Application Mode – Context Normalization

Remember – while it looks like directEDGAR is running on my desktop in the above video – it is actually running in Oregon or Virginia – but it brings all of the features to your local environment.