Pablo Arce
Shedding Light on the Dark Side of Data
In November, I had the pleasure of speaking at OpenSearchCon 2023 in Seattle, Washington. The presentation was a case study on performance optimization for extremely large search indices, based on an actual project for a Pureinsights customer in the data security business.
We’re sharing the presentation with you as it illustrates best practices in diagnosing search application performance, all while reducing operational infrastructure costs. I tried to be entertaining as well as informative, so I hope you enjoy the video and the summary below.
The Customer : Startup SaaS Provider in the Data Security Space
The customer is a company in the data security space that collects personal information on the “dark web” and aggregates it so that the “good guys” – corporate security teams, intelligence analysts, law enforcement – can analyze what information has leaked out due to various data security breaches that have occurred over time.
The Problem : Extremely Slow Search Performance
The customer’s business model relies on the ability of a search application to crawl an index of tens of billions of records and growing fast. The customer created a search application prior to the launch of their business that simply was not performing to expectations. A large number of queries simply would not execute, and the ones that did had an average query execution time of 15 seconds.
The Solution : Iterative Tuning and Updating of Search Application Architecture
After only two design iterations, Pureinsights helped the customer create an application architecture with greatly improved search performance. Query errors were virtually eliminated. Average query response times improved from 15 seconds to sub-second (15x improvement) and max response times went from 85 seconds to 15 seconds (5.6x improvement).
Lessons Learned
During the performance analysis process we were able to reduce the key lessons learned to the following:
- Follow the architecture guidelines recommend by the search engine provider, especially
- Index topology
- Share size
- Memory ratios
- Experiment with different configurations – not all application loads are the same
- Find your bottlenecks and attack them
- Keep an eye on performance cost ratios – at a certain point, incremental performance gains may be cost prohibitive.
Wrapping Up
We hope you enjoyed the OpenSearchCon 2023 replay, based on a real project to optimize search application performance. No only were the performance results satisfactory, but we saved the customer a lot of money as well.
If you have any questions please feel free to CONTAC US.
– Pablo