Solr and YaCy integration
YaCy supports the storage of document metadata and plain text to remote solr indexes. This can be activated with one single click (see below). We are currently changing the YaCy architecture to include a solr core as embedded index by default.
The remote index scheme is similar (but extended) to SolrCell; see http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration. This is also the same scheme that solr uses if documents are imported with apache tika.
How to attach Solr
Federated solr storage is switched off by default in YaCy, but you can simply switch it on in http://localhost:8090/IndexFederated_p.html
To attach a Solr server do the following:
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start YaCy and
- set the 'Remote Solr Index' flag on http://localhost:8090/IndexFederated_p.html
- you would need to also set the Solr server address, but since you installed solr in this example locally with default setting you can leave the configuration in YaCy as it is.
- then start the YaCy crawler. The crawler will fill both, YaCy and Solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/
Until now it is not possible to use the Solr index to search with YaCy in that solr index. But that may be an option in the future.
Screenshot of the Solr integration servlet at http://localhost:8090/IndexFederated_p.html
.. there are many more attribute fields!
This functionality is now available because:
- 1) to compare the functionality of Solr and YaCy and to compare the search speed
- 2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people want to use solr instead of YaCy.
- 3) to experiment and explore future uses of Solr inside of YaCy