[Résolu] Domaine interne


#1

Bonjour,

je me bas avec Tanaguru pour le faire fonctioner dans un domaine interne (.dev).
En gros, je veux tester ma DEV dont le domaine est blablabla.dev.

J’ai bien défini mon domaine dans mon fichier /ets/hosts et je ping bien… Mais Tanaguru ne veux rien savoir. Où Tanaguru prend ses DNS ?

Cordialement.


#2

Bonjour,

Pouvez vous nous donner plus de détails sur l’erreur rencontrée (retour de l’appli, log) ?
Avez vous paramétrer un proxy?
Il n’y a pas de configuration particulière pour paramétrer les serveurs DNS. Tanaguru s’appuie sur le “paramétrage système”.
Merci d’avance

Jerome


#3

Bonjour,

Je passe par un proxy , mais pour mes test, j’ai désactivé le proxy justement (via /etc/tanaguru//tanaguru.conf + restart de tomcat6).

Je suis sous Ubuntu LTS, et j’ai mis des hosts dans mon fichier /etc/hosts.
A partir d’Ubuntu je peux bien ping et wget mon nom de domaine, mais Tanaguru me mets toujours un message :

Soit le domaine n’existe pas
Il est dnas la config de l’hosts

Soit l’accès au site est interdit par le robots.txt (Lien vers le fichier robots.txt )
Non.

Soit le site nécessite une authentification
Non

Les logs sont pas super parlant (il y a une config pour faire mieux ?)…

17-03-2015 12:44:04:008 125147 INFO org.opens.tgol.orchestrator.TanaguruOrchestratorImpl - Launching audit site on http://blablabla.dev
17-03-2015 12:44:05:011 126150 INFO org.opens.tanaguru.service.command.SiteAuditCommandImpl - Launching crawler for page h t t p : / / blablabla.dev
17-03-2015 12:44:08:939 130078 INFO org.opens.tanaguru.crawler.framework.TanaguruCrawlJob - crawljob is running
17-03-2015 12:44:15:269 136408 WARN org.opens.tanaguru.service.AuditServiceImpl - Audit has no content
17-03-2015 12:44:15:351 136490 WARN org.opens.tanaguru.service.command.AuditCommandImpl - Audit status isERROR whileCONTENT_ADAPTING was required
17-03-2015 12:44:15:438 136577 WARN org.opens.tanaguru.service.command.AuditCommandImpl - Audit status isERROR whilePROCESSING was required
17-03-2015 12:44:15:544 136683 WARN org.opens.tanaguru.service.command.AuditCommandImpl - Audit status isERROR whileCONSOLIDATION was required
17-03-2015 12:44:15:547 136686 WARN org.opens.tanaguru.service.command.AuditCommandImpl - Audit status isERROR whileANALYSIS was required
17-03-2015 12:44:15:637 136776 INFO org.opens.tgol.orchestrator.TanaguruOrchestratorImpl - failure email sent to
[…]
17-03-2015 12:44:16:063 137202 INFO org.opens.tgol.orchestrator.TanaguruOrchestratorImpl - Audit site terminated on h t t p : / / blablabla.dev


#4

Tanaguru permet de faire des audits de page et de site.
Est ce que vous rencontrez la meme erreur lorsque vous lancez des audits de page sur la meme URL?
Est ce que les audits de site fonctionnent sur les URLs externes lorsque le proxy est configuré?

Par ailleurs, pour ajouter des logs, il faut editer le fichier WEB-INF/classes/log4j.properties et passer le niveau du package log4j.logger.org.opens.tanaguru.crawler=INFO à DEBUG

Cordialement
Jerome


#5

Alors les pages ça marche…

Pour le site, voici les log de debug :smile:

17-03-2015 15:35:09:118 6200780 INFO  org.opens.tgol.orchestrator.TanaguruOrchestratorImpl  - Launching audit site on h t t p : / / blablabla.dev
17-03-2015 15:35:09:972 6201634 INFO  org.opens.tanaguru.service.command.SiteAuditCommandImpl  - Launching crawler for page h t t p : / / blablabla.dev
17-03-2015 15:35:09:994 6201656 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - Directory: /var/tmp/tanaguru/crawl-1426602909994 created
17-03-2015 15:35:09:994 6201656 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - crawlConfigFilePath: /var/lib/tomcat6/webapps/tanaguru/WEB-INF/conf/crawler/ for copy
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - filepath : /var/lib/tomcat6/webapps/tanaguru/WEB-INF/conf/crawler//tanaguru-crawler-beans-site.xml
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - MAX_DOCUMENTS 10000
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - SCREEN_HEIGHT 1080
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - LEVEL Aw22;LEVEL_2
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - DECORATIVE_IMAGE_MARKER
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - SCREEN_WIDTH 1920
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - PRESENTATION_TABLE_MARKER
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - INFORMATIVE_IMAGE_MARKER
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - CONSIDER_COOKIES true
17-03-2015 15:35:10:000 6201662 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - PROXY_PORT
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - ALTERNATIVE_CONTRAST_MECHANISM false
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - INCLUSION_REGEXP
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - EXCLUSION_REGEXP
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - DEPTH 20
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - PROXY_HOST
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - DATA_TABLE_MARKER
17-03-2015 15:35:10:001 6201663 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - MAX_DURATION 86400
17-03-2015 15:35:10:072 6201734 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value h t t p : / / blablabla.dev/
17-03-2015 15:35:10:120 6201782 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - 10000 MAX_DOCUMENTS
17-03-2015 15:35:10:120 6201782 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value 10000
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.HeritrixAttributeValueModifier  - Update maxDocumentsDownload attribute of bean crawlLimiter with value 10000
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - 1080 SCREEN_HEIGHT
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Aw22;LEVEL_2 LEVEL
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  DECORATIVE_IMAGE_MARKER
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - 1920 SCREEN_WIDTH
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  PRESENTATION_TABLE_MARKER
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  INFORMATIVE_IMAGE_MARKER
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - true CONSIDER_COOKIES
17-03-2015 15:35:10:135 6201797 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value true
17-03-2015 15:35:10:149 6201811 DEBUG org.opens.tanaguru.crawler.util.HeritrixInverseBooleanAttributeValueModifier  - Update ignoreCookies attribute of bean fetchHttp with value false
17-03-2015 15:35:10:149 6201811 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  PROXY_PORT
17-03-2015 15:35:10:149 6201811 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value
17-03-2015 15:35:10:161 6201823 DEBUG org.opens.tanaguru.crawler.util.HeritrixAttributeValueModifierAndEraser  - Delete httpProxyPort attribute of bean fetchHttp because of null or empty value
17-03-2015 15:35:10:161 6201823 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - false ALTERNATIVE_CONTRAST_MECHANISM
17-03-2015 15:35:10:161 6201823 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  INCLUSION_REGEXP
17-03-2015 15:35:10:161 6201823 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value
17-03-2015 15:35:10:172 6201834 DEBUG org.opens.tanaguru.crawler.util.HeritrixParameterValueModifier  - [list: null] value
17-03-2015 15:35:10:172 6201834 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  EXCLUSION_REGEXP
17-03-2015 15:35:10:173 6201835 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value
17-03-2015 15:35:10:183 6201845 DEBUG org.opens.tanaguru.crawler.util.HeritrixParameterValueModifier  - [list: null] value
17-03-2015 15:35:10:184 6201846 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - 20 DEPTH
17-03-2015 15:35:10:184 6201846 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value 20
17-03-2015 15:35:10:194 6201856 DEBUG org.opens.tanaguru.crawler.util.HeritrixAttributeValueModifier  - Update maxHops attribute of bean tooManyHopsDecideRule with value 20
17-03-2015 15:35:10:194 6201856 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  PROXY_HOST
17-03-2015 15:35:10:194 6201856 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value
17-03-2015 15:35:10:204 6201866 DEBUG org.opens.tanaguru.crawler.util.HeritrixAttributeValueModifierAndEraser  - Delete httpProxyHost attribute of bean fetchHttp because of null or empty value
17-03-2015 15:35:10:204 6201866 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  -  DATA_TABLE_MARKER
17-03-2015 15:35:10:205 6201867 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - 86400 MAX_DURATION
17-03-2015 15:35:10:205 6201867 DEBUG org.opens.tanaguru.crawler.util.CrawlConfigurationUtils  - Modifier found for value 86400
17-03-2015 15:35:10:216 6201878 DEBUG org.opens.tanaguru.crawler.util.HeritrixAttributeValueModifier  - Update maxTimeSeconds attribute of bean crawlLimiter with value 86400
17-03-2015 15:35:10:444 6202106 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - crawljob is launchable
17-03-2015 15:35:11:955 6203617 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - Job validated
17-03-2015 15:35:11:974 6203636 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - Starting context
17-03-2015 15:35:12:750 6204412 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - Context started
17-03-2015 15:35:12:750 6204412 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - Request crawl start
17-03-2015 15:35:12:757 6204419 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - CrawlJob changes state to PREPARING
17-03-2015 15:35:12:833 6204495 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - PREPARING
17-03-2015 15:35:12:834 6204496 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - crawl start requested
17-03-2015 15:35:13:739 6205401 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - CrawlJob changes state to RUNNING
17-03-2015 15:35:13:742 6205404 INFO  org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - crawljob is running
17-03-2015 15:35:13:752 6205414 DEBUG org.opens.tanaguru.crawler.processor.TanaguruWriterProcessor  - should process? h t t p : / / blablabla.dev/ with mime type unknown false
17-03-2015 15:35:13:847 6205509 DEBUG org.opens.tanaguru.crawler.processor.TanaguruWriterProcessor  - should process? dns:blablabla.dev with mime type text/dns false
17-03-2015 15:35:14:851 6206513 DEBUG org.opens.tanaguru.crawler.processor.TanaguruWriterProcessor  - should process? dns:blablabla.dev with mime type text/dns false
17-03-2015 15:35:15:854 6207516 DEBUG org.opens.tanaguru.crawler.processor.TanaguruWriterProcessor  - should process? dns:blablabla.dev with mime type text/dns false
17-03-2015 15:35:15:858 6207520 DEBUG org.opens.tanaguru.crawler.processor.TanaguruWriterProcessor  - should process? h t t p : / / blablabla.dev/ with mime type unknown false
17-03-2015 15:35:16:746 6208408 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - CrawlJob changes state to STOPPING
17-03-2015 15:35:16:747 6208409 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - CrawlJob changes state to EMPTY
17-03-2015 15:35:17:931 6209593 DEBUG org.opens.tanaguru.crawler.framework.TanaguruCrawlJob  - CrawlJob changes state to FINISHED
17-03-2015 15:35:18:002 6209664 DEBUG org.opens.tanaguru.crawler.CrawlerImpl  - remove Orphan related contents  0 elements
17-03-2015 15:35:18:013 6209675 DEBUG org.opens.tanaguru.crawler.CrawlerImpl  - remove Orphan SSPs  0 elements
17-03-2015 15:35:18:160 6209822 WARN  org.opens.tanaguru.service.AuditServiceImpl  - Audit has no content
17-03-2015 15:35:18:304 6209966 WARN  org.opens.tanaguru.service.command.AuditCommandImpl  - Audit status isERROR whileCONTENT_ADAPTING was required
17-03-2015 15:35:18:421 6210083 WARN  org.opens.tanaguru.service.command.AuditCommandImpl  - Audit status isERROR whilePROCESSING was required
17-03-2015 15:35:18:468 6210130 WARN  org.opens.tanaguru.service.command.AuditCommandImpl  - Audit status isERROR whileCONSOLIDATION was required
17-03-2015 15:35:18:471 6210133 WARN  org.opens.tanaguru.service.command.AuditCommandImpl  - Audit status isERROR whileANALYSIS was required
17-03-2015 15:35:18:783 6210445 INFO  org.opens.tgol.orchestrator.TanaguruOrchestratorImpl  - failure email sent to [root] on audit n° 92
17-03-2015 15:35:19:153 6210815 ERROR org.opens.emailsender.EmailSender  - emailContent  <p>Bonjour,</p>
<p>Votre audit du project <strong>blablabla.dev</strong> a échoué. Voici les causes possibles :
<ul>
<li>Soit le domaine n'existe pas</li>
<li>Soit l'accès au site est interdit par le robots.txt</li>
<li>Soit l'accès nécessite une authentification</li>
</ul>
</p>
<p>Retenter votre chance :</p>
<p>Tanaguru</p>
17-03-2015 15:35:19:157 6210819 ERROR org.opens.emailsender.EmailSender  - addr root
17-03-2015 15:35:19:182 6210844 INFO  org.opens.tgol.orchestrator.TanaguruOrchestratorImpl  - Audit site terminated on h t t p : / / blablabla.dev

#6

Hum… Le DNS en .int semble incorrect…

org.opens.tanaguru.crawler.processor.TanaguruWriterProcessor  - should process? h t t p : / / blablabla.dev/ with mime type unknown false

#7

Re,

C’est bon, trouvé ! ! !

Dans “tanaguru-crawler-beans-site.xml”, passer acceptNonDnsResolves à true. Ce qui donne :

<bean id="fetchDns" class="org.archive.modules.fetcher.FetchDNS">
    <property name="acceptNonDnsResolves" value="true" />
    <property name="digestContent" value="false" />
</bean> 

Du coup, il accepte mes URLs non standard…


#8

Bravo,

J’allais y venir, mais si vous faites les questions et les réponses, c’est encore mieux [clin d’oeil]
Merci en tout cas pour ce cas intéressant.

N’hésitez pas si vous avez d’autres questions.

Enjoy

Jerome