I had an email from a faculty researcher how needed to capture the frequency of board meetings for a sample of companies – they wanted some help. I was setting up the system and decided that this would be a worthwhile post to help illustrate why Search is not enough – you need Extraction and Normalization.
While there are many ways the concept of reporting the frequency of board meetings can be expressed – I know from past review that one form of the expression is ‘The Board met N times in YYYY’. So that was the basis of my first search;
We found 796 relevant documents – now to EXTRACT & NORMALIZE those findings. Just select the ContextNormalization feature from the menu and specify the inputs:
After pressing the Okay button the results will soon be available in the Output Folder. The results include enough details to create an audit trail back to the original document and they also have the data that is needed:
I highlighted three of the rows to drive home the point that this is a versatile tool. It can work with various forms of number expressions.
From start to finish this took me about three minutes. The hard part of this is to continue and find the other ways this concept will be expressed. I tried another form Board held. This returned more results:
I would use the same strategy as before to Extract and Normalize – here is a peek at the results:
Again – a user intent on capturing the meeting frequency of a large sample is going to have to learn how the concept is expressed (clearly there are other ways to express this concept) and continue with the alternatives until they have identified the forms of expression in their sample. However, once they use their knowledge our tools can help them very rapidly convert those search results into data.