Do we just make a robots.txt file and put it in the root directory, or is there an existing robots.txt file that I can modify (I still can’t find it)
you need to create robots.txt
please keep in mind this is not an ideal solution - it is just what I worked through on one of our own sites - things will change i am sure but for now robots.txt is an OK solution - not perfect not full proof, just ok.
Do we just make a robots.txt file and put it in the root directory, or is there an existing robots.txt file that I can modify (I still can’t find it)
you need to create robots.txt
please keep in mind this is not an ideal solution - it is just what I worked through on one of our own sites - things will change i am sure but for now robots.txt is an OK solution - not perfect not full proof, just ok.
Thank you Golles
I just copied and pasted SEOGUY’s code into a new file and saved it as “robots.txt” and put it into my Root Directory. I guess that’s all that needs to be done.
Maybe we can borrow the latest spiders.txt file from osCommerce ... ? I have attached it to this post.
I just copied and pasted SEOGUY’s code into a new file and saved it as “robots.txt” and put it into my Root Directory. I guess that’s all that needs to be done.
Maybe we can borrow the latest spiders.txt file from osCommerce ... ? I have attached it to this post.
yes - robots.txt in the root is fine
spiders.txt is something completely different to robots.txt - spiders.txt as far as oscommerce is concerned is used to stop search engine spider crawling the site with a session id in the url - completely different to robots.txt which is used to manage the path search engine spiders do / do not take through your site and what is indexed by them.
Is it also possible to add all pages which you want spidered by google and ignore all other pages by disallow ?
I know a some extra work but in my case 100 pages
using v 1.1.3 and still have the missing category SEO issues.
The robots.txt “solution” (more of a workaround with unfavorable results) is UNACCEPTABLE.
Where is 1.2? I don’t think another milestone release is available even through SVN in beta stage…
Where is the “Magento Team” on this one? Do they at least recognize the issue????
Can you pleaes be more specific re: SEO category issues- there were several prior to v1.1 and most have gone away now - so what specific issues are you meaning?
I have personally worked with the Magento team on a lot of the SEO issues so indeed I do know they recongized a lot of the issues.
robots.txt is a totally acceptable solution for any website to control search engine spider activity. I agree there were better ways of tackling some of the older issues with magento seo but most of these have been addressed. robots.txt has always been and always will be a personal preference type of thing.
Please be specific when you state
(more of a workaround with unfavorable results)
- we can only try and help each other if we understand exactly what you mean.
This thread is fairly old - you should look at some of the newer threads re: SEO:
We have several hundred products in multiple categories on one of our sites and there is just one url for each of them.
1st you should remove category names from the urls - this is a setting in admin
Then double check you have:
1) Upgraded to the latest release?
2) deleted the var/cache directory?
3) Deleted the url rewrites from previous versions?
4) refreshed the layered navigation cache?
etc.
The specific issue I’m having is that
-every product link URL except the dropdown navigation points to domain.com/product.html when it should point to domain.com/category/subcategory/product.html
This includes
New Products
Featured Products
Recently Viewed Products
Related Products
Compare Products
You may also be interested in…
etc
-ok so you say turn off catagory names is URLs ... well in that case breadcrumbs DON’T work. Breadcrumbs are an important feature and standard navigation for first time users. They all point to Home > Product instead of Home > Category > Sub-Category > Product
and using spiders.txt stops extra URLs from being indexed, however, with this method you also lose all those hits / links going to the non-indexed URLs…
Is it really that hard just to make ALL the urls include the catagory names??
and yes i’m on the updated version 1.1.3 and have done all of the standard methods to refresh the cache and layered navigation. I don’t have a problem enabling or disabling “urls include the catagory names"… I just want those settings to work throughout the site. not ONLY in the dropdown navigation.
-ok so you say turn off catagory names is URLs ... well in that case breadcrumbs DON’T work. Breadcrumbs are an important feature and standard navigation for first time users. They all point to Home > Product instead of Home > Category > Sub-Category > Product
Correct - i am afraid that is how it works for now - to avoid duplicate urls pointing to the same product you need to remove category names from the url.
and using spiders.txt stops extra URLs from being indexed, however, with this method you also loose all those hits / links going to the non-indexed URLs…
It is not spiders.txt it is robots.txt.
If you block a url via robots.txt a user can still get to the page - you just stop search engines spiders indexing the page.
The idea of using the robots.txt is to force search engine spiders to only spider and index those pages of your site you want indexed - e.g. to stop duplicate content issues or other files you may not want indexing in the searching engine results pages.
In my opinion you are better or blocking duplicate content on your site via robots.txt than having the search engines find it and index and flag it as duplicate content.
Is it really that hard just to make ALL the urls include the catagory names??
for now if you want to avoid duplicate content (i.e. multiple urls leading to one page) then you need to remove the category names from the url.
I don’t have a problem enabling or disabling “urls include the catagory names"… I just want those settings to work throughout the site. not ONLY in the dropdown navigation.
To my knowledge those settings do work throughout the site.
Just to re-iterate - this version is much better than pre v1.1 versions for SEO factors - are there still improvments to be made? - absolutely - but the seo factors on v1.1x are 100% better than before.
Correct - i am afraid that is how it works for now - to avoid duplicate urls pointing to the same product you need to remove category names from the url.
Has anyone looked into a temporary workaround to append categories to the URLs? Maybe turn off the category URL names then just append the category names to the link code?
It is not spiders.txt it is robots.txt.
Sorry that is what I meant. I understand how it works but you have to understand that you could be losing well over 50% of your hits by excluding these URLs.
I don’t have a problem enabling or disabling “urls include the catagory names"… I just want those settings to work throughout the site. not ONLY in the dropdown navigation.
To my knowledge those settings do work throughout the site.
As I stated before. Turn on category names in URLs and look at ANY of these
New Products
Featured Products
Recently Viewed Products
Related Products
Compare Products
You may also be interested in…
Or even the product sitemap…
They won’t include the category name in the URL.
This version may be better but it still seems to be in beta stage to me…
Has anyone looked into a temporary workaround to append categories to the URLs? Maybe turn off the category URL names then just append the category names to the link code?
You could do that manually via the product url key
Sorry that is what I meant. I understand how it works but you have to understand that you could be losing well over 50% of your hits by excluding these URLs.
Not necessarily as many of your pages would probably go supplemental. Also to have the duplicate pages available (and linked to from your site) your are diluting page rank flowing to the core pages that you want to rank.
We are using robots.txt and nofollow on many of the internal links like layered navigation, recently viewed, compare products etc, just to be sure that pages linked to from those links do not get spidered / indexed.
As I stated before. Turn on category names in URLs and look at ANY of these
New Products
Featured Products
Recently Viewed Products
Related Products
Compare Products
You may also be interested in…
Or even the product sitemap…
They won’t include the category name in the URL.
Sorry I had misunderstood you on this one. You are correct - I thought you were talking about with category urls off.