Researching Search Engines
Every company does some researches for what tools they are going to use in future projects, we are not an exception. This time we had to choose a search engine that could:
- handle literally tons of information,
- could work with even more requests,
- should be fast,
- should accept more than 10 priorities,
So we looked at these search engines:
Solr
Solr is written with Java and is based on Apache Lucene. Solr brings the best speed, reviews and data flow flexibility since it accepts JSON, XML and Python structures for indexing. More over there exists a number of libraries to have almost native experience in python project. Sure Solr has its own black-box (Environment for testing queries within a browser), yet this also means Solr needs a separate daemon to run it.
Sphinx
Sphinx is written with C with strong PHP API in mind for it's huge community.
Xapian
Xapian search engine is written with C++, unfortunately most of it's documentation is only about using it with C++. So we need a higher abstraction level tool.
Elastic Search
Elastic is as Solr written with Java and is based on same old Lucene. Yet it accepts only JSON for indexing also it's a young project with small community so stability in our situation is bit concerning.
HSearch
HSeach uses it's own unique HBase database so it's optimized from all sides so it means that even with ton's of data it will be fast. Yet this means it requires separate database and knowing HBase potential problems.
RiakSearch
RiakSearch uses also it's own Riak database witch is based on Amazon Dynamo paper. It has same issues as HSearch plus it's stated as Beta Software and Earlang configurations are not our favorites.
Outcome
So in the end we have chosen Solr due to huge community, flexibility, speed and lot of information about it.
Others
HSearch and RiakSearch and all the others NoSQL based solutions dropped out due to requirement of additional database. More over every single one of those databases are different and young. This means bigger risks and longer project development time.
Elastic Search is too fresh. It has some serious potential, yet for now it's just too young.
Xapian just did not brought any unique abilities to the table. It's mediocre in every single way. We need more than that.
Sphinx was really the only strong competitor to Solr. But after counting all the pros and cons we noticed that we found more required and important pros in Solr. Also we was afraid to reach Sphinx's flexibility limits in some situations and that it would be near to impossible to extend those limits.