Enterprise Search with Apache Solr

Solr

We are going to look at some powerful Solr features and how it is different than MySQL.

What Is Apache Solr?

Apache Solr is an Open Source, enterprise search server. It stores information in such a way that searching is very fast. In a nutshell, it’s also a storage system like SQL and NoSQL.

Solr is written in Java and uses the Lucene search library for its core functionality. You don’t need to know Java to work with Solr.

How It Is Different than MySQL?

If you’re new to Solr, the best way to understand the internals of Solr is to compare it with MySQL.

  • MySQL stores information in the form of tables and rows. Whereas Solr stores information in form of schema and XML documents. Schema defines the structure of the documents.
  • You can have multiple tables in MySQL, similarly you can have multiple schemas in Solr.
  • Columns in a table define the structure of the table similarly in Solr fields define the structure of the schema.
  • In MySQL you store in the form of rows whereas in Solr you store in the form of documents.
  • In MySQL when columns are indexed the rows get arranged in a tree like structure. Whereas in Solr when a field is indexed it is arranged into a inverted index data structure.

What Makes It Fast for Search?

Solr uses inverted index data structure to search for words in documents and intersects the final result. No other storage system uses this kind of data structure.

What Are Other Features of Solr?

Solr offers many other features like spell correction, faceting, highlighting, result grouping, auto completion etc. Implementing these features into your website will make it stand out from the crowd. These features provide better user experience and a new way to access content on your website.

Why You Should Integrate website/e-commerce site with Solr?

When the number of article/News on your site increases, MySQL starts to perform slow when users search on your site. This is because MySQL loops through every article and uses regular expressions to match search terms. This is a very CPU expensive task. Sometimes users get request timeout errors due to PHP script execution time limit. If there are 10,000 article/news/product then for every search query MySQL is going to hit the file system 10,000 times which is a very expansive task and will slow down your website.

Whereas Solr can search 10,000 documents in just a couple of seconds. If you have a medium size blog, then a single Solr instance is enough to power all posts.