Update to Historic CIK Mapping File

This post is a bit wonkish – but update instructions are near the end of this post. We made an update to our CIK mapping file late last week. This is the file the application uses to retrieve filings by companies that have completed some reorganization that triggers filings under a new CIK. Our file maps between the new and the old CIK so if you are trying to match data based on an old CIK but the registrant is filing under a new CIK the application will retrieve the data you requested even if you are using the new (or old) CIK.

For example – the entity know known as The Walt Disney Company files under the current CIK 1744489. Prior to their acquisition/merger with subsidiaries of The Fox Corporation they (Disney) made filings under CIK 1001039 (from late 1995 until late 2019. Prior to 1995 they filed under CIK 29082.

Our mapping file was developed to anticipate you collecting some data from another source that might have one or more of these CIK as the key and you wanting to match that data to some data you would anticipate finding in their filings. If you use the CIK filtering feature or if you retrieve any of our preprocessed data based on CIK – the application interface will have a box to check asking whether you want to use historical CIKs in your search/retrieval. The image below shows the check box to select if you were to run a search and included a CIK filtering file.

Include Historical CIKs Check Box

It is your option to determine if you want only data for your CIK file or if you want us to augment your CIK list with the values from our mapping file. If you select the Include Historical CIKs option the application will augment your CIK list with all of the additional CIKs that have been associated with the entity. So for example, if you have CIK 1744489 in your sample the application will automatically add CIK 29082 and CIK 1001039 to the in-memory version of your list as it processes the list for the task. If you have a request file that has CIK 1744489 and the YEAR values of 2021, 2020, 2019 and 2018 – the application will extend your list to include each of the additional CIKs and the YEAR values you identified for your original CIK list. To make this clear – the image below has the the original request in black and the extended list as determined by the application in red.

CIK/YEAR request augmented by the application

The values in red are not added to your version of the list – but they are used by the application. However, any missing values (in red or black) will be reported in the missing_cik_year_pairs.csv file at the end of the search/extraction. Sorry for getting lost in the details – but they are important. The real reason for this post is to make sure you remember to periodically update the mapping file on your version of directEDGAR and since we just updated the file this is a perfect time for you to update yours – the process is simple.

From the File menu selection select Options and then select Update Historical CIK. Press the Perform Update button.

Options panel – Update Historical CIK

The application will call home, license validity will be established and our server will return a copy of the latest mapping file which will be saved for use by the application. When the process is complete (usually a second or two) a confirmation message will appear.

Update Successful response message

We are still working/struggling to communicate with you about the results – which current CIKs mapped to an older CIK so you are not fully surprised by the fact that you asked for say the MDA from CIK 1744489 from 2018 but instead you received the MDA from CIK 1001039. The challenge is how to do this in real time – for instance APA (CIK 1841666) became the successor registrant to Apache Corp (CIK 6769) on March 1, 2021. While adding the CIK matching to the mapping file is trivial. It is much more complicated to go back and embed a new CIK in all of the prior documents.

Interesting Issue – We need to start thinking about our GENDER indicator.

I was introducing a new intern to one of our internal tools that we use for data processing. We have a number of dashboards that are populated with data when there is a missing value or if the system populates a field with an unexpected value. One of those dashboards is triggered when we are processing compensation data. If a new director or executive has not been included in any prior filing we are not likely to have a GENDER value. In this case the data shows up in a queue for someone to review and code.

When one of the team has to review a filing to identify the right code to use we provide them a link to the document in a dashboard with a place to enter the value they determine is correct. We have developed some proprietary tools to scan the document to identify the use of some specific person titles (Mr, Mrs, Ms, and the names that follow those title words. In addition – parts of the sentences personal pronouns are included in the dashboard with the referent names. If there are multiple titles or pronouns associated with one name (MR. KEALEY is the husband of MRS. KEALEY) these are flagged.

My intern wondered if we were making a mistake by limiting our search to those honorifics/titles. He became even more concerned when I explained the process we followed when we were not able to identify the GENDER using the dashboard. In those cases (when there no gender relating title or a gender explicit personal pronoun (he, she, her, him, he, his) associated with a person in a filing (yes it happens more than you would believe) we Google for a reference or image of the person. We make a determination based on a search result that contains the name of the person and the name of the entity that made the filing. Historically we have really thought it made our job easier if one of the search results has a picture of the person (presumably we did not find a picture in the proxy).

If you haven’t sorted out the problem yet, his question was – is this appropriate in a world where people are willing/prefer to identify as something other than the binary classification we use?

We try to add an indicator of GENDER to allow our research clients to test hypotheses on the association between various dimensions of firms and the characteristics of their executives and board. If we don’t sort out how to identify those executives and directors who believe that that the binary classification of their personage does not reflect some dimension of their identify we are failing to provide the right data. That was an awkward sentence but it reflects the truth and the problem.

While this is important I am stymied at the moment about how to move forward. I did a search in proxy (DEF 14A) filings made since 2005 for words that I thought would help identify these cases (gay, queer, lesbian, non-binary, LGBTQ). This seemed like a reasonable start to me – however a recent article in the New York Times made me less than confident that I had enough knowledge to create the right search (more on that later).

My first search was limited to filings made from 2005 to 2015. I found only 324 documents from 169 companies. There were only 135 documents from 69 companies where the context indicated that the finding was something other than about a Mr. or Mrs. Gay or an address (Gay Avenue, Gay St.). What was even more interesting was that there was no mention of directors with one or more of these attributes. All of the filings where these words existed the context was generally related to a statement about the registrant’s view of human rights and the value they place on a diverse workforce. The only other context for these words was when the words were included in shareholder proposals. I found no language used to indicate a person might prefer a non-binary classification.

However, when we move forward to the filings made in 2016 – 2020 some filers started indicating that some of their directors were LGBTQ. Usually this was disclosed in a Diversity matrix. One interesting thing about this disclosure is that it is not consistent across filers, even if they have a common board member. Specifically I have found examples where one board has a category to indicate which members of the board identify as members of the LGBTQ community – another filer with the same board member does not provide any indication about this characteristic of their board members. I will also observe that 2 registrants began providing a diversity matrix in 2019 that includes a non-binary classification related to gender. So far in 2021 (through 3/25) there are 4 registrants that have included this dimension in their diversity matrix. Despite the addition of this dimension to these matrices there is no indication of a director using this classification in any of the examples I could find.

This is something we are going to have to think about. Note – the only information I have found so far is just an indication that some board member identifies as LGBTQ. I have not yet identified information that would indicate a board member would classify themselves as other than F/M.

In an earlier paragraph I observed that I ran a search using words that I consider relevant. There was a really interesting New York Times article in the paper on 3/21 (the article first appeared online on 3/15) Who is making sure the A.I. Machines Aren’t Racist . I think I have read this article four times now. It is relevant to this problem – how can I/we authoritatively use language to classify a person who identifies as non-binary. This is kind of tricky. I keep telling myself that our first step is to focus on the personal pronouns used in text that describes the person. We can code for that and bring it up for review whenever there is more than one gender indicator or there are none that we are currently familiar with. The article though reminds me that we can get this wrong and need to be humble about the steps we intend to follow.

My current plan is to anticipate more nuanced disclosures about this attribute of board members and executives. Once we start finding evidence that is relatively unambiguous about diversity I think I am going to try to reach out and communicate with the filer to confirm our interpretation of their language. If we receive some positive confirmation then we will begin expanding the values used in the GENDER field.

I thought about putting up some example images of the disclosures we have found. I have decided that the people making these disclosures are intentionally making them to readers of the proxy – they are not necessarily making an announcement to the world. Thus I have decided not to provide examples at least until I can have a conversation with one or more of the directors who have made/supported the disclosure choices made by their companies.

A related problem is whether or not we propagate a disclosure choice made by one director to all of the registrants they are associated with. Imagine director FIRST_NAME LAST_NAME is identified as a member of the LGBTQ community and prefers the use of XIE as disclosed or apparent in proxy filings by FIRST_COMPANY in 2021. They are also directors of SECOND_COMPANY in 2018 – 2021 but the proxies for SECOND_COMPANY make no mention of this attribute. In the proxies for SECOND_COMPANY all references to FIRST_NAME LAST_NAME are made using either male or female pronouns. Do we change the GENDER from M/F in the data for SECOND_COMPANY or do we honor the disclosure choices they made in SECOND_COMPANY filings?

Bottom line is – this is now on our radar. I will indicate once we start needing to expand the values we report in the GENDER field. Right now I can’t imagine pointing to specific entities or people or even the documents where we drew the data to make our inferences. However, I am always happy to engage with you and take feedback on the classification decisions we ultimately make.

Missing PERSON-CIK and related metadata

During our normalization processes for Director Compensation we try to match the named directors with existing data to add the PERSON-CIK, SEC-NAME, AGE and SINCE (a measure of tenure). If we cannot match the data using our automatic processes the data table is shunted aside so a person can review and attempt to match manually. If we are not successful after reviewing the filing, prior filings and other source documents then we add some signal to the field to indicate the values we could not identify.

Certara Inc (CIK 1827090) filed their first 10-K on March 15, 2021. This filing included DC and EC data. We are missing values for some of the people listed in their DC table. Certara filed a draft registration statement in October 2020 and an S-1 in November. We have learned that many pre-IPO directors do not follow the registrant into the public market. In Certata’s case two directors who received compensation in 2020 resigned before the filing of their initial S-1 and so there is limited information about some attributes of the directors other than their names in the S-1 filing and in their initial 10-K. If directors resign before the S-1 filing and their holding are less than 5% they have no reporting obligation under Section 16. In these cases we try to discover if the individuals had some reporting obligation because of their relationship with another entity. This requires more than just a name match though.

In the case of Certara we can identify data for William E Klitgaard since he has a history on EDGAR. He has served as a director of Syneos Health since 2017. In his biography Syneos makes an explicit mention of his position on Certera’s board “Mr. Klitgaard currently serves on the board of directors and Audit Committee at Certara, a leading drug development consultancy . . .” This explicit mention will allow us to update the DC data of Certara with Mr. Klitgaard’s CIK.

However, Certara’s other director that resigned before their IPO was Edmundo Muniz. Their S-1 filing indicates that Dr. Muniz was their CEO from 2014 until early 2020. Their is not enough additional biographical information about him that would allow us to link him to another SEC registrant. In cases like this we would then search EDGAR for his name and then try to find some additional evidence to link him to the reporting company (Certera). But in this case there are no Muniz’s in EDGAR whose profile comes close to matching the profile that we would expect Dr. Muniz. Therefore – in this case we will be able to add his GENDER and his tenure (SINCE). We will not be able to add values for PERSON-CIK, SEC-NAME or AGE.

Periodically we will inspect EDGAR for cases like Dr. Muniz. Specifically we will look for name matches on EDGAR and if we find one we will then review the filings for the registrant to try to determine if the Dr. Muniz associated with that registrant is the same as the Dr. Muniz mentioned in Certera’s filings. It is much easier when they make an explicit mention of the other company (Certera in this case) in the biography of Dr. Muniz. Because he was not a director when Certara was a public company they are not obligated to make mention of that fact. If there is a close name match and the other company operates in a related industry we will use Google to try to determine if the individuals are the same or not. Until we can do that though the fields will continue to indicate that data is not available.

There are cases where a director might be affiliated with one or more public companies for long periods of time but never trigger a Section 16 filing obligation. We see this most often with banks and then foreign entities that are cross-listed into the US. We also see it when the director is the board representative/proxy for some security holder. I’ve been meaning to research the legal basis for the limited reporting by the foreign executive and board membersto learn why they are exempt. I’ll save that for another time. In the latter case the filings usually indicate that all compensation is passed through to the named director’s employer (security holder).

New MetaData Testing – Available Now

With the our filings now hosted in the cloud we can add new metadata to filings without having to ship out 6.0 TB hard drives to our clients. We are running a test right now with the addition of a number of new fields. This initial test is limited to 10-K filings made between 1/1/2016 and 12/31/2020 that are not amended. This test index is now available (10KTAGTEST Y2016-Y2020).

The fields we are adding in this initial test and their description are described in the schedule below:

Field NameDescription
ACCEPTANCEDATE/TIME of EDGAR system recording the ACCEPTANCE (but not the dissemination) of the filing. This value is in the form YYYYMMDDHHMMSS where hour is based on a 24 hour time representation. You will note that we historically have used the dissemination date so any filings made after YYYYMMDD1730 have an associated RDATE one business day later.
FILERCATEGORYThe entity reported filer category as defined by the SEC (see table below for codes).
COMMONSTOCKSHARESOUTSTANDINGThe number of common shares outstanding as of the latest practical date. This is the value reported on the cover sheet and is reported in shares.
COMMONSTOCK_DATEThe date as reported by the registrant that the number of common shares were calculated.
PUBLICFLOATThe label perhaps is not precise. This is the market value of all shares held by non-affiliates as of the last business day of the registrant’s second quarter.
FLOAT_DATEThis is the date the public float was measured.
ICFRAUDITAn indication as to whether or not the auditor of the financial statements also issued an opinion on the internal controls over financial reporting. The values are TRUE/FALSE This flag is initially only available for registrants who adopted the new 10-K form in 2020. While compliance was required for all 10-K filings made after 4/27/2020 we have noticed a number of registrants who did not conform to the requirements. This tag will initially be included only in filings where the registrant met their reporting obligation though we expect to backfill once our testing is complete.
Table Describing New Metadata Tags added to 10-K filings

The values for FILERCATEGORY and their meaning are:

CODEMEANING
LAFLarge Accelerated Filer
SRCSmaller Reporting Company
AFAccelerated Filer
NAFNon-Accelerated Filer
SRAFSmaller Reporting Accelerated Filer
Explanation of Codes used in NEW FILERCATEGORY Tag

Having these codes embedded in the filings now allows you to use these in your searches when appropriate. For example – suppose you want to find all 10-K filings made by Accelerated Filers who reported that they included an ICFR-Audit from their auditor. To include meatdata in a search we use the fields button on the application to select the appropriate field. We need to specify two fields for the search described above. First, we want to select the FILERCATEGORY field and specify AF for Accelerated Filer.

Selecting a field to use in a search.

Once we type in AF for Accelerated filer and hit the OK button the search pane in the application will populate with (FILERCATEGORY contains(AF)). To also limit the search to those filings that included a ICFR audit indication on the face of their 10-K we need to add the AND operator. Then use the fields button to select the ICFRAUDIT field and enter the word TRUE (valid values for this field are TRUE/FALSE).

Selecting the ICFRAUDIT field to use in a search.


After entering TRUE and then hitting the OK button our search is now (FILERCATEGORY contains(AF)) and (ICFRAUDIT contains(TRUE)). So we are looking for any 10-K filing made by an accelerated filer that indicated their auditor performed an audit of internal controls.

Whenever you perform a search and create a SummaryExtraction or a ContextExtraction the application always includes all of the available metadata from the filings that were returned in your search. Therefore you don’t have to specify the metadata in the search – all of the available metadata is included as columns in the output file. My search returned 16 documents. I created a SummaryExtraction from this search. If you want to review those results follow this link (Example SummaryExtraction)

Our processor code for the filings has historically added the same metadata to every document that is included in a filing. So all of the exhibits associated with a 10-K filing had the same tags as the 10-K filing. In this initial test we are only tagging the parent document (the 10-K) with the additional metadata. The exhibits will continue to include the existing metadata. For testing purposes the only documents in the 10KTAGTEST index folder are 10-K filings (no amendments or exhibits). The originally 10-K and exhibits are still in the usual place. Your feedback will help us determine whether we should add additional details to the individual exhibits.

To access this index from your instance – follow the steps described in the help to update your index libraries (File\Options, Index Library, Generate Index Library). Remember – you don’t have to do any browsing – when the Main Index Options is visible the Main Index Library Folder path should be visible – hit the Generate Library button.

Index Library Selection Tool – once visible – confirm the path exists in the Main Index Library Folder hit Generate Library

Unfortunately there is a catch. We have 3,300 or so 10-Ks filed in this window for which we cannot yet authoritatively confirm one or another item of metadata. This seems to mostly related to those cases where the registrant has two or more classes of common stock. Our challenge is matching the name of the common stock class with the shares outstanding. Right now we have identified some cases where we are matching the wrong name to the count. This is a critical issue and one we are still working on. This problem is why we are still running in test mode rather than in production. I am hoping we do not have to resort to manual matching. In those cases we have decided not to add metadata for those particular attributes yet. So you will see cases where the public float is reported but there is no data about the shares outstanding.

Finally – please remember that these values are reported by the registrant. There are bound to be registrant errors. This is a new area for us and our immediate focus is on capturing the values as reported. I have a very significant degree of confidence that when the registrant reports it is an ACCELERATED FILER we have captured that correctly. Whether or not they are is another issue. Also remember that the meaning of some metadata values will change across time. The authoritative source for an explanation of any specific terms/fields is always the SEC’s rule making disclosures.

Your feedback on additional metadata to include will be appreciated. As I was working on this I had a conversation with a client who expressed an interest in having a hyperlink to the document on EDGAR. This is the second time I have heard that this could be beneficial so we will add this field in the next iteration (less than a month I hope). So if you think some other fields would be useful please let us know.

Additional Proxy Forms Available

We have made five additional proxy type filings available through the platform. The specific filing types are DFAN14A, DEFC14A, DEFC14C, DEFM14A, DEFM14C. In some cases these filings are made by investor groups to communicate with the existing shareholders about their activities. Because these filings might be made by these third parties we have added the INVESTOR_CIK and INVESTOR_NAME tags to the filings.

The addition of these tags provides an opportunity to close the loop between an investor taking a position that triggers a SC 13D filing obligation and efforts by the investor to exert influence over the management/strategy/board of the issuer.

A brute force matching of these is not difficult. Step 1 would require that you search the SC 13D archive using xfirstword and (DOCTYPE contains(SC*)). This search identifies all SC 13 filings and amendments and leaves out the exhibits. The exhibits are not needed to identify the issuer/investor relationship. Use the SummaryExtraction feature to pull a listing of the filings – this listing will include all of the metadata attached to each filing.

Create a new column in the file (I named my new column ISSUER_INVESTOR) and use Excel’s CONCATENATE function to concatenate the value in the CIK column with the value in the INVESTOR_CIK column – note include an underscore or dash between the two values.

Full dump of SEC 13D filings with ISSUER_INVESTOR identifier created to match across filings.

To match these to particular filings run an analogous search in one of these other filing types. For example, I could run the following search over the DFAN index collection xfirstword and (DOCTYPE contains(DFAN*)) to identify all DFAN filings made – create the same ISSUER_INVESTOR identifier and then use VLOOKUP to match the ISSUER_INVESTOR combinations in the two filing types. So for instance – this strategy allowed me to match the SC 13D filings STARBOARD VALUE LLP filed related to their investment in DARDEN and then identify the associated DFAN filings that were also filed by STARBOARD.

Presumably we are doing this at scale and not looking at individual (one-off) documents. So if we have identified for example all DFAN fililngs that are associated with an SC 13D INVESTOR filer we can isolate those by using the CIK-DATE feature of the application to run a new search with the issuer CIK and the RDATE of the filings that were identified as a match (there were 3,884 DFAN filings associated with an INVESTOR who filed an SC 13D).

The new filings are available now – to access them please use the File Options Index Library Generate Library command to access the latest filings. Further – if you have not used the Zoom feature when trying to select an index – try it.

Index Library listing using the Zoom feature.

Director Relationship Data

I’ve always thought those relationship sketches that show connections between companies and people are really cool. One of the challenges using that information in an academic study has been that matching the names of people and companies to actual concrete data can be daunting. I hinted back in July that we would start addressing that problem. As usual – we ran into more roadblocks than expected and our efforts were complicated by some unexpected problems in rolling out to Appstream. I am happy to report that this data went live this morning – there is a new Data Table available from our application named DIRECTOR_RELATIONSHIP. To access the data create a request file in the usual manner and look for the new entry at the bottom of the Data Tables user control on our application as illustrated in the image below:

ExtractionPreprocessed User Interface with DIRECTOR_RELATIONSHIP Available

One of the complications in creating this artifact was to decide how to pick a span of time to use for reporting. The final span we decided on is not perfect but is at least a reasonable starting point – this data is organized by reporting calendar year. During calendar year 2020 (1/1/2020 – 12/31/2020) we identified all DC tables that were filed and isolated those that reported DC data for FYE 2019 or 2020. We identified all directors reported in the DC tables where we were able to match a PERSON-CIK to the directors. We then looked for all other DC tables filed in the 2020 calendar year that included that director.

The data is organized by ISSUER CIK/Calendar Year. So for example if you want to explore the 2020 relationship data for Apple Inc (CIK 320193) your request file should have the value 2020 in the YEAR column. The results will list all of the directors reported in the DC data that Apple filed in 2020 as well as their PERSON-CIK and SEC-NAME (if available) as well as their GENDER and their AGE and tenure (SINCE) as it was reported in the Apple filing. The remaining columns will list the CIK (OTHER_ISSUER_CIK_#) of any other issuer(s) that included the director in their DC data as well as their tenure with the other registrant (OTHER_ISSUER_SINCE_#). Here is a screenshot of this data for APPLE INC:

DIRECTOR_RELATIONSHIP data view with focus on Apple Inc (CIK 320193)

If you look carefully at the table you will see that Mr. Bell is affiliated with four other registrants – note – those affiliations reflect the fact that he was included in the DC table for those registrants during the 2020 calendar year. It turns out that he retired from 2 of those boards in 2019/2020. The interesting observation in this is that the biography of the directors is forward looking – our data is concurrent.

We are still looking at ways to improve this data. For example – you might note the missing data for BRANDON PILOT ( in the first row of the image). He evidently does not have any SEC reporting responsibility and the registrant he is affiliated with is a Smaller Reporting Company – it takes longer to update data on these registrants. I will note that the DIRECTOR_RELATIONSHIP data for 2020 covers 31,386 PEOPLE-COMPANY observations. We have not been able to identify a PERSON-CIK for only 675 of those observations. Some of those will come later – but others will never show up because the director might never have an SEC filing obligation. Usually that occurs when they are non-US domiciled, the issuer never offers equity to directors or the issuer is newly registered and the director steps down around the time of the IPO.

I am going to extend this post to provide a concrete example for our decision to anchor on PERSON-CIK as a critical identifier. The image below has data for a Mr. Hollis who serves as Chairman of Hain Celestial (CIK: 910406) and as an independent director for SunOpta Inc. (CIK: 351834).

R Dean Hollis data from directEDGAR’s relationship tables.

As you can see, Hain Celestial reports his name in the director compensation table as R. Dean Hollis, SunOpta reports his name as Dean Hollis. If we were trying to match across these firms – this small variation would require some contortions and more likely than not some review. By providing the PERSON-CIK we are able to provide more certainty to this process.

You might also notice the age difference in the two rows. SunOpta is a 1/30 filer and their proxy was filed on 5/1/2020. Sometime between then and 10/13/2020 (the filing date of Hain’s DEF 14A) Mr. Hollis celebrated his 60th birthday.

Finally, our initial push of this data covers calendar years 2009 to 2020.

COVIDADJ

A number of public companies have made announcements about adjustments to compensation because of the effect of the pandemic on revenues and/or income. In the EC and DC tables that we have seen the adjustments have generally been reflected in the Compensation Discussion and Analysis section of the DEF 14A or the 10-K. However we have had one registrant report the adjustment in the body of a table. Below is an image of the DC table reported in NET 1 UEPS’s proxy (CIK 104514):

Director Compensation Table as Reported by NET 1 UEPS

Because the adjustment has been so clearly labeled we have decided to include a new column in the DC/EC data when the registrant makes this disclosure. If you download DC data for this company you will see this column in the output.

Interestingly – they did not use the same practice when they prepared their EC table. Rather than reporting the adjustment in the table they described that “The 2020 amounts presented for Messrs. KotzĂ©, Pillay and Smith are after the agreed COVID-19 salary reduction under which we donated 30% of their fourth quarter base salary to initiatives fighting against the COVID-19 pandemic in South Africa.” In these cases we are not making any additional disclosure or adjustment to the data.

Interestingly, Rave Restaurant Group (CIK 718332) reported that their directors waived fees for the 3rd and 4th quarter of 2020. Since they provided no direct explanation of the motivation for the waiver we made the tough decision to net the fees waived against OTHER (RAVE DEF 14A).

Rave Restaurant Group 2020 DEF 14A

We are just a month or so before we begin seeing a crush of filings for the 12/31 registrants. It will be interesting to see how they choose to reflect any compensation changes made as a result of the pandemic. As long as they explicitly label adjustments as COVID related we will normalize them into the COVIDADJ column.

Updates: Y2021 Filings Added, LibreOffice Added to Platform, Beginning to wonder if we should reorganize our indexes.

If you do an Index Update on the platform you will see that we have added 2021 filings. At the moment our updates to 2021 are going to take place early Saturday morning (around 4:00 AM UTC-6 hours). While I suppose it would give us bragging rights to accelerate that I am not sure there is a significant benefit to moving to more frequent updates. If anyone has a strong belief that would increase their productivity please reach out so we can discuss.

We installed LibreOffice so you can open and manipulate CSV files on the instance rather than having to transfer them back to your desktop just to open them. I did a Search and then Summary Extraction and played around some with the file – it works. The interface/menus are very similar to all Windows applications so I found it useful – a little bit different. When I did the install I added German, French, and Chinese language packages so if you prefer to work with the menu using one of those languages you should have that ability. If we need another language pack please let me know.

Finally, I am starting to wonder about reorganizing the indexes. I don’t think I want to have that conversation in this forum – once we get some time to take a breath I will be sending out an email to users sharing my thoughts and asking for your feedback. I think there are some cool things we can do that would improve your productivity and the nature of your searches.

Data/File transfer to/from directEDGAR

We have had several questions about using files and CIK lists with directEDGAR. I prepared a short video that illustrates the process of moving data that is in the Clipboard as well as files to/from your session. The video illustrates the transfer of a list of CIKs to use in a search. When the search is finished we need to copy the missing CIKs from the search back to our local computer and then we want to extract the actual documents and save those locally. All of this is illustrated in this video.

As a side note – we use a service from a company named Rev to create closed captions for our videos. There can sometimes be a lag as long as 24 hours between our video release and the addition of the captions. They do a much better job with the captions than the free services so we appreciate your patience during the caption process.

Spreadsheet Tool and Python?

I was sitting on my laurels this morning watching activity through the new APPSTREAM instances of directEDGAR when I received two separate emails about improving the experience. The first one was wondering if we could install Python and the second was a request for a spreadsheet program to more readily review CSV artifacts that are created using the platform.

I should have thought of adding a program to more naturally open csv files. I apologize – I had tunnel vision the last few weeks as I was much more worried about the config files and managing disk permissions to give you fuller access to the archive. I will be adding an open source spreadsheet program before the week is out. We can’t add Office unless we buy a use license for each user and I would not know how to budget for that.

The notion of adding Python is very intriguing and it is also seems possible. You have read access to all of the SEC filings in directEDGAR.

Appstream session with directEDGAR archive 10-K filing directory for CIK 1750 FYE 5/31/2019 open in session.

The basic work flow with our platform is to first Search for relevant documents – and while we have great tools to assist your Extraction and Normalization of content there are plenty of use cases where you might want to use Python to achieve a more finer grained Extraction and Normalization process than our tools offer.

I was initially imagining you would use the DocumentExtraction feature to access specific documents – compress them – move them locally and then run your own code. But the more I think about the argument/suggestion/comment I received this morning the more I understand the value of this. We are going to look into this and see what is necessary. I believe we can do this – the challenge will be to find the right compromise on the version and installed libraries.