Posting in the Magento forums has been disabled pending the implementation of a new and improved forum solution which should better serve the community.

For new questions please post at magento.stackexchange.com, the community-run support site for the Magento community. We will be providing updates on the new forum solution soon. For questions or concerns please email community@magento.com.

Magento Forum

Crawl and Security
 
mk384
Member
 
Total Posts:  60
Joined:  2010-05-21
 

Hello

I have been trying Google to crawl my site for week however they keep replying to say it is an issue with the site.

I installed and generated a robots.txt file under my public html

http://www.magentocommerce.com/magento-connect/robots-txt-6783.html

However I see with this extension you have the ability to decide what parts of your site you want search engine to Crawl and disallow others.

For obvious reason I dont want any admin and other sensitve areas being crawled.
at the moment all crawlers are enabled

User-agent: *
Crawl-delay: 5
Disallow:
Disallow: /404/
Disallow: /app/
Disallow: /manage/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /index.php
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php
Disallow: /.js$
Disallow: /?___from_store=
Disallow: *___from_store=
Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /.css$
Disallow: /.php$
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /.php$
Disallow: /rss*

My question:

1.Is there any other folder or directory I need to disallow?
2. How to I get it disallow my admin section please, which has been renamed from

www.mysite.com/admin
to
www.mysite.com/xyz084

1. Would really appreciate if anyone can clarify and help me on this, as its the last thing before I have a fully operational site. taken ages to create and setup.

2. What permission should the robots.txt file should be given?

Kind Regards

 
Magento Community Magento Community
Magento Community
Magento Community
 
mk384
Member
 
Total Posts:  60
Joined:  2010-05-21
 

Comon people.

Thousands of people must be using Robots.txt file in Magento but not even 1 reply which is so dissapointing.

Becoming very disillusioned with Magento community and this forum.

 
Magento Community Magento Community
Magento Community
Magento Community
 
thebod
Moderator
 
Avatar
Total Posts:  81
Joined:  2010-08-11
 

Hi,

first of all: as soon as you put the path to your admin into the robots.txt everyone is able to read it.
If you change it, let’s say to mysecretadmin, no crawler can find it if it’s nowhere referenced (as it should be). But if you put the link into the robots.txt to disallow it, a possible attacker could use this to find your admin path. Simple, eh?

Second: these list looks good, the best way to track your indexed URLs is to use Google Webmastertools, and then, if you see something you don’t like, just add it to the robots.txt.

Third: 644 or whatever you want - it’s accessible (read access) by the webserver, so in fact the webserver needs to read the file - no special access rights needed.

...and last: What do you expect after one day? People have jobs, people have families, people need leasure time - I hope this post will answer all your questions so please think about the people behind community and why you maybe don’t get an answer within one day…

Thanks, thebod

 
Magento Community Magento Community
Magento Community
Magento Community
 
mk384
Member
 
Total Posts:  60
Joined:  2010-05-21
 

Thank you my freind

I will go through your reply and work from it.

I received a reply from Google who asked me to add these 2 lines in my robots.txt file

User-agent: googlebot
Disallow:

Which I find confusing, becuase as I understand this means it will prevent my site being indexed? Is this correct or please can you clarify.

As for the reples. I understand if it was some unique issue but I was assuming lots of people would be using robots.txt file so was hoping for reply from some kind memebr such as yourself, my other posts have received not many if any replies so I was speaking of my general experience up until then, having spend long time trying to implement Magento I only need to complete this last step to get myself online.

Anyway really appreciate your kind reply, and would be grateful if you can please clarify on the above

Kind Regards

 
Magento Community Magento Community
Magento Community
Magento Community
 
mattb13
Jr. Member
 
Total Posts:  13
Joined:  2011-12-23
 

Thanks for the help!
And what about Googlebot?

 
Magento Community Magento Community
Magento Community
Magento Community
 
Brynnae
Member
 
Avatar
Total Posts:  36
Joined:  2012-04-17
California
 

A long list would be there in Robots.txt, this will definitely increase security threats because mostly everything which is unnecessary to crawl are included in the list. You can do even more secure by giving second way secure to give link as nofollow, noindex Meta Robot Tags.

 
Magento Community Magento Community
Magento Community
Magento Community
 
craigan
Jr. Member
 
Avatar
Total Posts:  11
Joined:  2010-09-22
 

I see you\’re using:

Disallow: /?limit=

which I am too. However, in my Google Webmaster Tools account, I\’ve got all kinds of Duplicate Meta Descriptions warnings because of this

/reviews/myproduct.html
/reviews/myproduct.html?limit=50

Why isn\’t it working? I could have sworn it was previously working, but now isn\’t?

Is there a better way?

 
Magento Community Magento Community
Magento Community
Magento Community
Magento Community
Magento Community
Back to top