Nutch1 Quick Tutorial, Learning to Crawl

Blog moved to new address.

Advertisements

7 thoughts on “Nutch1 Quick Tutorial, Learning to Crawl

  1. Hi, I have the situation to integrate solr nutch in drupal 7
    I have integrated the solr-4.10.4 with drupal 7 through module the search operation works fine with the apache solr search(module) that available in drupal7. the Point is to fetch the hyper links that are available on the page. so that i found apache nutch is fine. but i have configured the Solr in drupal with the following change of files in solr. 1)schema.xml 2)solrconfig.xml 3)protwords.xml from drupal module.

    how to connect all these solr4.10.4 nutch1.12 and drupal7 kindly help in this.

    • Hi Karthik, unfortunately I am no longer working on this project and I can not help you. But, have you thought about joining the mailing lists of the Solr and Nutch projects, there you can probably find people that can help you.

  2. Everything works fine apart from indexing this output comes and there in nothing in elastic search

    Elastic Version: 1.7.2
    Nutch 1.13

    Indexer: starting at 2017-06-12 13:42:24
    Indexer: deleting gone documents: false
    Indexer: URL filtering: false
    Indexer: URL normalizing: false
    Active IndexWriters :
    ElasticIndexWriter
    elastic.cluster : elastic prefix cluster
    elastic.host : hostname
    elastic.port : port
    elastic.index : elastic index command
    elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
    elastic.max.bulk.size : elastic bulk index length in bytes. (default 2500500)
    elastic.exponential.backoff.millis : elastic bulk exponential backoff initial delay in milliseconds. (default 100)
    elastic.exponential.backoff.retries : elastic bulk exponential backoff max retries. (default 10)
    elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)

    Indexer: number of documents indexed, deleted, or skipped:
    Indexer: finished at 2017-06-12 13:42:41, elapsed: 00:00:17

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s