Extraction & Normalization of Board Meetings

I had an email from a faculty researcher how needed to capture the frequency of board meetings for a sample of companies – they wanted some help.  I was setting up the system and decided that this would be a worthwhile post to help illustrate why Search is not enough – you need Extraction and Normalization.

While there are many ways the concept of reporting the frequency of board meetings can be expressed – I know from past review that one form of the expression is ‘The Board met N times in YYYY’.  So that was the basis of my first search;


We found 796 relevant documents – now to EXTRACT & NORMALIZE those findings.  Just select the ContextNormalization feature from the menu and specify the inputs:


After pressing the Okay button the results will soon be available in the Output Folder.  The results include enough details to create an audit trail back to the original document and they also have the data that is needed:


I highlighted three of the rows to drive home the point that this is a versatile tool.  It can work with various forms of number expressions.

From start to finish this took me about three minutes.  The hard part of this is to continue and find the other ways this concept will be expressed.  I tried another form Board held.  This returned more results:


I would use the same strategy as before to Extract and Normalize – here is a peek at the results:


Again – a user intent on capturing the meeting frequency of a large sample is going to have to learn how the concept is expressed (clearly there are other ways to express this concept) and continue with the alternatives until they have identified the forms of expression in their sample.  However, once they use their knowledge our tools can help them very rapidly convert those search results into data.

Always Interesting Issues in Compensation

We Extract and Normalize the Executive and Director compensation data whether it is reported in the 10-K or the DEF 14A.  Compensation is a required disclosure in the 10-K but companies can take the relief offered by the CFR and chose to incorporate it by reference to the DEF 14A (proxy) if the proxy is expected to be filed within what we think is a 90 day window following the filing deadline for the 10-K.

We have started seeing more and more discrepancies between what is filed in the 10-K and what is ultimately reported in the proxy.  These discrepancies are not usually very large but they are interesting.  Argos Therapeutics Inc filed their proxy today – here is a link.  On May 1, 2017 they filed an amendment to their 10-K (10-K/A) which appears to have been filed solely to include Items 10 – 14.

When their filing was made today we captured the EC data and our system triggered an event because the table in the proxy covered the same reporting periods as the table in the 10-K/A and the totals did not match.  Here is the data that was reported in the 10-K/A:


Here is the EC data as it was reported in the proxy:


The total for each year has changed it it appears that the differences can be explained by differences reported for Other.  So looking more closely at the description of the Other amount it appears that they modified their description between the two filing.  Here is the language used in the proxy:argus_other

The description of Other in the 10-K/A does not mention 401(K) matching contributions – otherwise it matches verbatim the language used in the DEF 14A.  Now that we can explain the discrepancy we will remove the prior data and then update with the new table.