En:FAQ

Aus YaCyWiki

Wechseln zu: Navigation, Suche

Inhaltsverzeichnis

General

What is this?

YaCy is a distributed Web Search Engine, based on a peer-to-peer network.

What is a proxy?

A proxy is a program, which redirect internet-traffic from i.e. a local network of computers to the internet and back. Sometimes the Host where the proxy software is installed has no other function and is called proxy itself. A caching proxy saves the websites and pictures on it etc. locally to save bandwidth when a computer "behind the proxy" requests already loaded content.

What does indexing mean?

Indexing means that a web page is separated into the single words on it and to save the URLs to the sites containing them under a reference to the word itself in a database. So searching for a word (or many words) may be easily performed by fetching all URLs "belonging" to the search term.

What's the meaning of "to crawl"?

A so-called "crawler" fetches a web page and parses out all links on it; this is the first step or "depth 0". It continues to get all web pages linked on the first document which is then called "depth 1" and does the same respectively for all documents of this step. The crawler is limitable to a specified depth or can even crawl indefinitely and so can crawl the whole "indexable Web", including those parts of the indexable web who are censored by commercial search-engines and therefore normally not part of what most people are presented as The visible web.

YaCy in general

I am not a technican. Can I install YaCy easily and use it to index my own web pages?

YaCy is very easy to install. You don't need any special knowledge or additional software; also you don't need to set up an extra database engine. Indexing your own website isn't hard too: Simply crawl it and turn off DHT Distribution and DHT Receive to keep the index of your site on your peer.

Can I crawl and index the web with YaCy?

Yes. You can start your own crawl and you may also trigger distributed crawling, which means that your own YaCy peer asks other peers to perform specific crawl tasks. You can specify many parameters that focus your crawl to a limited set of web pages.

Is there a central server? Does the search engine network need one?

No. The YaCy network architecture does not need a central server, and there is none. We distinguish three different classes of peers:

junior
peers that cannot be reached from the internet because of routing problems or firewall settings;
senior
peers can be accessed by other peers
principal
peers are like senior but can also upload network bootstrap information to ftp/http sites; this is necessary for the network bootstraping.

Junior peers can contribute to the network by submitting index files to senior/principal peers without being asked.

Search Engines need a lot of terabytes of space, don't they? How much space do I need on my machine?

The global index is shared, but not copied to the peers. If you run YaCy, you need an average of the same disc memory amount for the index as you need for the cache. In fact, the global space for the index may reach the space of Terabytes, but not all of that on your machine!

Do I need a fast machine? Search Engines need big server farms, don't they?

You don't need a fast machine to run YaCy. You also don't need a lot of space. You can configure the amount of Megabytes that you want to spend for the cache and the index. Any time-critical task is delayed automatically and takes place when you are idle surfing (this works only if you use YaCy as http proxy).

How long does a search take?

Our architecture does not do peer-hopping, we also don't have a TTL (time to live). We expect that search results are instantly responded to the requester. This can be done by asking the index-owning peer directly which is in fact possible by using DHT's (distributed hash tables). Because we need some redundancy to compensate for missing peers, we ask several peers simultanously. To collect their response, we wait a little time of at most 6 seconds (by default, you can change that).

Do I need to set up and run a separate database?

No. YaCy contains it's own database engine, which does not need any extra set-up or configuration.

What kind of database do you use? Is it fast enough?

The database stores either tables or property-lists in files with the structure of AVL-Trees (which are height-regulated binary trees). Such a search tree ensures a logarithmic order of computation time. For example a search within an AVL tree with one million entries needs an average of 20 comparisons, and at most 24 in the worst case. This database is therefore extremely fast. It lacks an API like SQL or the LDAP protocol, but it does not need one because it provides a highly specialized database structure. The missing interface pays off with a very small organization overhead, which improves the speed further in comparison to other databases with SQL or LDAP api's. This database is fast enough for millions of indexed web pages, maybe also for billions.

Why do you use your own database? Why not use mySQL or openLDAP?

The database structure we need is very special. One demand is that the entries can be retrieved in logarithmic time and can be enumerated in any order. Enumeration in a specific order is needed to create conjunctions of tables very fast. This is needed when someone searches for several words. We implement the search word conjunction by pairwise and simultanous enumeration/comparisment of index trees/sequences. This forces us to use binary trees as data structure. Another demand is that we need the ability to have many index tables, maybe millions of tables. The size of the tables may be not big in average, but we need many of them. This is in contrast of the organization of relational databases, where the focus is on management of very large tables, but not of many of them. A third demand is the ease of installation and maintenance: the user shall not be forced to install a RBMS first, care about tablespaces and such. The integrated database is completely service-free.

What does Senior Mode mean? What is Junior Mode?

Junior peers are such peers that cannot be reached from other peers, while Senior peers can be contacted. If your peer has global access, it runs in Senior Mode. If it is hidden from others, it is in Junior Mode. If your peer is in Senior Mode, it is an access point for index sharing and distribution. It can be contacted for search requests and it collects index files from other peers. If your peer is in Junior Mode, it collects index files from your browsing and distributes them only to other Senior peers, but does not collect index files.

Why should I run my peer in Senior Mode?

Some p2p-based file sharing software assign non-contributing peers very low priority. We think that that this is not always fair since sometimes the operator does not have the choice of opening the firewall or configuring the router accordingly. Our idea of 'information wares' and their exchange can also be applied to junior peers: they must contribute to the global index by submitting their index actively, while senior peers contribute passively. Therefore we don't need to give junior peers low priority: they contribute equally, so they may participate equally. But enough senior peers are needed to make this architecture functional. Since any peer contributes almost equally, either actively or passively, you should decide to run in Senior Mode if you can.

My peer says it runs in 'Junior Mode'. How can I run it in Senior Mode?

Open your firewall for port 8080 (or the port you configured) or program your router to forward this port to your computer.

How can I help?

First of all: run YaCy in senior mode. This helps to enrich the global index and to make YaCy more attractive. If you want to add your own code, you are welcome; but please contact the author first and discuss your idea to see how it may fit into the overall architecture. You can help a lot by simply giving us feedback or telling us about new ideas. You can also help by telling other people about this software. And if you find an error or you see an exception, we welcome your defect report. Any feed-back is welcome.


Rejected Queue


Bild:flag-germany.gif Von dieser Seite existiert auch eine deutsche Version.

Bild:icon_work.png TODO: english and german FAQ have too many differences.

Persönliche Werkzeuge