- suitable for indexing large amount of data
- suitable for fast retreival
- support layer for NLP
- supports incremental updates
- out of the box: facets, auto-suggest, spell check, .. etc
من قبل
Zaid Rabab'a , Technical Team Lead , ESKADENIA Software
In my opinion - Apache Solr
SolrTM Features
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Linearly scalable, auto index replication, auto failover and recovery
Near Real-time indexing
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture
for more information about this topic here is a book title :
A Comparison of Open Source Search Engines
Christian Middleton, Ricardo Baeza-Yates
من قبل
THIKRALLAH SHREAH , Technical Team Leader , bayt.com
Well, there are many open source engines available nowadays, each one of them beats the others in one or many things.
Solr/lucene and sphinx have proved to support for large indexes.(Check the websites, you will see big names there).
Based on my personal experimentation, sphinx is a little bit faster than Appche Solr, but really it depends on the application you apply. Solr in clustering could beat sphinx in some detailed cases, and sphinx could beat solr on one query. Again the requirements decide which one is suitable for you.
As for NLP, Xapain rocks in this. other search engine may provide some NLP, but nothing betas xapian.
Real time indexes is usually supported by most of the search engines. Sphinx 2.0X has a excellent support, specially that sphinx provide "Mysql" interface, so it's really like working with database.
Out of the box features, is the Solr thing . Solr.lucene is the most open source search engine with rich APIs.
In most standard usage, sphinx and solr should be sufficient for most requirements, But you need to fully understand how the run, and how the drivers API handle the operation.