Matt Willsmore
Solr 9.0 has just been released, so we thought it would be useful to provide a brief roundup of what it includes to help you plan!
Search features
Solr 9.0 uses Lucene 9.0 (the core search library that Solr and Elasticsearch are built around) which brings with it some changes. First, the Java version must be 11 or higher, so while that may need a little planning to roll out if you have standardized on an earlier version, it will be worth it because it also brings a whole host of performance improvements in both indexing and search, as well as a smaller index footprint overall.
Secondly, Solr 9.0 introduces several new features found in Lucene. On the querying side, the big headline, and especially of interest for us here at Pureinsights, is the introduction of the Dense Vector field type and K Nearest Neighbour Query Parser. This allows Solr to make use of BERT-style language models to perform vector searches and take advantage of the recent advances in NLP that those models have enabled.
Language support has also received some care and attention with the introduction of new stemmers (the algorithm that handles different word forms) for Hindi, Indonesian, Nepali, Serbian, Tamil, and Yiddish as well as some improvements to Norwegian.
Another new helpful addition is the SQL screen which is now part of the admin UI.
Security and stability
Much of the focus in Solr 9 though has been around security. Solr now runs with the Java Security Manager running by default, there is a new certificate authentication plugin, and all request handlers now support security permissions for access. Zookeeper and Jetty bind to localhost by default for better security too. Zookeeper has also been upgraded to allow secure communications.
For stability/scalability, there is now a rate limiter to prevent flooding the engine with requests, as well as a task management interface to keep track of any tasks running within the cluster. This release also introduces the idea of Node Roles which will allow the search cluster to assign responsibilities to specific nodes. Initially the supported roles will be ‘Data’ which means it can host index shards and replicas and ‘Overseer’ which is responsible for cluster-wide collection management. This allows a scenario where a dedicated overseer node could be used (with no data role) ensuring it cannot become overwhelmed by data-related processing and will be less prone to lock ups and other problems.
Codebase management
Solr has flourished because of the dedication and contributions from the community, but with that comes a little bit of an overhead in terms of code base complexity, which over years can become a maintenance headache. This release of Solr sees quite a number of items either removed, deprecated, or moved to modules to make way for the creation of a slimline distribution of Solr with optional packages available to be added as needed. We have already discussed the pending removal of the Data Import Handler, but also removed are the autoscaling framework, to be replaced with a pluggable replica assignment framework, and the removal of the Velocity template framework which has now become an installable package. Cross Data Centre Replication has also been removed as have some legacy items such as the SolrCache and LegacyBM25SimilarityFactory.
SQL support, HDFS storage support, Hadoop authentication, JWT authentication, and the GCS backup repository have all been packaged as modules as part of this restructure.
And that is it in a nutshell. This is not an exhaustive summary and there are many other changes and improvements which are listed in the detailed release notes.
If you want to find more information as well as details on how to upgrade, please refer to the official documentation here:
Major Changes in Solr 9:: Apache Solr Reference Guide
And of course, if you need any support or advice with your Solr implementation (whatever the version) we would be happy to help! CONTACT US
Cheers
Matt
Relevant resources: