Enhancing Search Focus with Numeric Ranges

I received an interesting email from a client this morning asking about including a dollar sign in a search to cut down on the noise from a search since they wanted disclosures only when a monetary amount was reported in proximity to the search phrase. Rather than share their search I will describe another similar search. Suppose you want to find the amounts reported as expenditures for research and development. A natural starting point would be to search for research and~ development. Note the ~ appended to a search operator causes the search engine to treat the word as a term not an operator. A search for research and~ development returns the phrase research and development. The search phrase research and development though returns any document with both word (no proximity constraint).

The problem (as indicated in the email) with the search research and~ development is that the phrase could exist in many places in a document without a disclosure of the amounts.

For example Apple used the phrase nine times in their 2017 10-K – most of the hits were noise – The Company believes ongoing investment in research and development (“R&D”), marketing and advertising is critical to the development and sale of innovative products, services and technologies.

If we add a number range constraint to the search we can significantly reduce the noise. Our application does not index dots, dashes, commas, dollar signs . . . but we do index number groups We can search for ranges of numbers by inserting the lower and upper bounds of the range separated by 2 ~ symbols.

To achieve the goal of identifying disclosures that might describe the amount of expenditures for research and development I proposed this search (research and~ development) w/10 1~~999. Clearly this search will take longer because the search engine is going to have to inspect every instance of the R&D phrase for proximity to any number in the range 1 to 999. But it will significantly reduce the noise from the first search. The first search yielded 19,391 documents when applied to the 10-K filings filed in 2016-2020. The second search returned only 13,466 documents. The noise is of course not completely eliminated but it is greatly reduced.

To use a number range in your search remember that dollar signs, dots and commas are not indexed. However the digits are. So a search for a number in the range of $1 to $999,999,999,999 can be reduced to 1~~999. If a number is found that meets the criteria – only the digits to the left of the decimal or comma will be highlighted (but they can be extracted with the Context Extraction feature).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s