Host Analyzer. The Host Analyzer interacts with the Fetcher in the Worker (also part of Step 4). The fetcher initially downloads the blog content from each unknown URL, in order to identify which blog host it relates to. When identifying the blog host, the Analyzer can identify the URL of the RSS of the blog host, and from this RSS the list of all URLs from each blog post. The blog hosts URLs are now included into the source database of the System Manager, with a relating link filter describing the structure of this blog host. Each relating blog post, however, are resent to the Worker and fetcher for downloading content. The main components involved in this step are as follows.  The Scheduler is the unit that manages when the spider needs to check out certain URLs for any new updates. For most URLs the ping server delivers all updates automatically, except for three different areas where the spider needs to do frequent polling and downloading to look for updates: URLs inserted manually from the input application not covered by ping servers, blog comments from most URLs as long as the ping server does not push updates of this blog element, and thirdly, controlling some blog hosts that should be updated by a ping server to see if there are any omissions. Frequency of checking the URLs is rule-based.  The Fetcher downloads the RSS and the entire HTML, matching them and analysing which rules to apply to get the right URLs into the source database and right content including all blog elements.

Appears in 1 contract

Sources: Grant Agreement

Host Analyzer. The Host Analyzer interacts with the Fetcher in the Worker (also part of Step 4). The fetcher initially downloads the blog content from each unknown URL, in order to identify which blog host it relates to. When identifying the blog host, the Analyzer can identify the URL of the RSS of the blog host, and from this RSS the list of all URLs from each blog post. The blog hosts URLs are now included into the source database of the System Manager, with a relating link filter describing the structure of this blog host. Each relating blog post, however, are resent to the Worker and fetcher for downloading content. The main components involved in this step are as follows.  • The Scheduler is the unit that manages when the spider needs to check out certain URLs for any new updates. For most URLs the ping server delivers all updates automatically, except for three different areas where the spider needs to do frequent polling and downloading to look for updates: URLs inserted manually from the input application not covered by ping servers, blog comments from most URLs as long as the ping server does not push updates of this blog element, and thirdly, controlling some blog hosts that should be updated by a ping server to see if there are any omissions. Frequency of checking the URLs is rule-based.  • The Fetcher downloads the RSS and the entire HTML, matching them and ~~analysing~~ analyzing which rules to apply to get the right URLs into the source database and right content including all blog elements.

Appears in 1 contract

Sources: Grant Agreement

Common use of Host Analyzer Clause in Contracts