Call-back icon  Sales: Call 877.832.5289 (N America)|310.295.4144 (International)

Magento

eCommerce Software for Online Growth

Magento Forum

   
Page 1 of 3
Duplicate URLs - extent of the issue and partial, temporary resolution
 
golles
Sr. Member
 
Total Posts:  210
Joined:  2008-01-15
 

The extent of the duplicate url problem had not quite hit home, although I know it is a signifcant issue, until I got a spider to crawl our site.
The auto-generated google sitemap is not working for us at the moment, so I decided to use a sitemap generation service freely available on the internet to generate our xml sitemap on a temporary basis until magento bug is resolved.

The tool i use is pretty good – you can find it at: http://www.xml-sitemaps.com/

This tool goes off and crawls your site finding url’s and then producing a sitemap. It obeys the robots.txt file and so in the crawl process gives a reasonable indication of what a search engine spider would find.

I let the sitemap tool do it’s stuff and to my amazement it produced the following results:

• On this particular site we have about 40 products in 11 categories and a handful of products with size / colour variations – so a maximum of 70 skus – a small site.
• We had restricted some pages via robots.txt (actually our robots.txt file looked like this:

Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/

• The sitemap tool only went 3 levels deep and added 500 urls to the sitemap – at which point it stopped spidering as the free tool has a 500 url limit
• The sitemap tool found a significant amount of urls for each product and category – created by the different views of categories, prices, colours etc and
• So from 40 products and 11 categories we had over 500 urls spidered – this is a huge issue.

Looking throught the urls in the generated sitemap I noticed all of the urls generated from the different views had a url starting with the /catalog/ directory.

So just by adding a couple of lines to the robots.txt I was able to solve the majority of the problems – so our robots.txt file is now like this:

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Sitemap: http://www.domain.co.uk/sitemap.xml

So I ran the free sitemap tool again – now it only finds 74 url’s – great - the majority of the issue is temporarily resolved and I am seeing the proper rewritten urls for each product / category.

BUT

I still have duplicate urls for each product as each product is still able to be accessed from several different urls, such as:

www.domain.co.uk/product.html
www.domain.co.uk/category/product.html

and if the product is in multiple categories:

www.domain.co.uk/category1/product.html
www.domain.co.uk/category2/product.html

etc

I see the robots.txt workaround as a fudge it is not a good long term strategy to rely on robots.txt stopping spiders, but for now it is OK.
Long term this has to be fixed with better rewrite rules or different coding for url generation and by addressing the product in multiple categories issue.

I urge you to run the free sitemap tool over your magento site to see what you find.

It is really critical from a search engine perspective and the effectiveness of the SEO on your site that this is dealt with prior to a search engine spider finding and indexing your site.

The really scary part is this is a small store of ours – we have others we are planning to migrate to magento with 8000 skus ! We will not be embraking on this until the url issue is resolved.

I am not putting this forward as an ultimate resolution, but one which we are testing and appears to HELP the issue, NOT resolve it.

Anyway hope this helps some of you.

 
Magento Community Magento Community
Magento Community
Magento Community
 
seoguy
Member
 
Total Posts:  46
Joined:  2008-01-25
 

Please see:

http://www.magentocommerce.com/boards/viewthread/6019/

BTW:  Thanks for the robots.txt.  I would recommend that you should do this:

User-agent: Googlebot
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Sitemap: http://www.domain.co.uk/sitemap.xml

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Sitemap: http://www.domain.co.uk/sitemap.xml

 
Magento Community Magento Community
Magento Community
Magento Community
 
golles
Sr. Member
 
Total Posts:  210
Joined:  2008-01-15
 
seoguy - 10 April 2008 09:34 AM

Please see:

http://www.magentocommerce.com/boards/viewthread/6019/

I agree that is a good work around for a small part of the problem - but not a long term solution it - also does not attack the full extent of the issue - urls are generated and spiderable for every category, category view, product, product view, etc - none of the urls are rewritten and they are all spiderable - if you run a spider across your site you will see the issues.

for a 70 sku site having over 500 spiderable urls most of which lead to the same products is a major issue - the only way to block this temporarily is with robots.txt but this again is a partial, termporary solution.

 
Magento Community Magento Community
Magento Community
Magento Community
 
peterw83
Member
 
Total Posts:  44
Joined:  2007-11-12
 

golles, It is agreed that we need to fix it in the code so that the URL’s are handled properly.

@MagentoTeam I hope in the update release that is coming by the end of the month these URL issues will be included.

Thanks

 
Magento Community Magento Community
Magento Community
Magento Community
 
Blue Acorn
Jr. Member
 
Avatar
Total Posts:  11
Joined:  2008-04-07
Atlanta, GA
 

Great thread guys - this is a HUGE issue and I will also not be migrating any sites until the URL issue is resolved.  I’ve used a few eCommerce solutions that handle this well - but my favorite is a specific solution (that I won’t mention) that allows store owners to define a static URL for each product/category/page.  So you can enter the URL into a text field in the admin and it automatically handles the redirects - it’s a brilliant solution - or allow it to auto-generate URLs.  I prefer defining the exact URL myself.

 Signature 

Blue Acorn eCommerce Consulting

 
Magento Community Magento Community
Magento Community
Magento Community
 
adam777
Jr. Member
 
Total Posts:  30
Joined:  2008-05-08
 

... where is the ROBOTS.txt file so that I can change it ?

 
Magento Community Magento Community
Magento Community
Magento Community
 
Dustin
Sr. Member
 
Total Posts:  118
Joined:  2008-03-13
Columbus, OH
 

With a combination of this robots.txt file and the product url solution here: http://www.magentocommerce.com/boards/viewthread/8185/, It seems to have minimized the biggest issues with duplicate content.

 
Magento Community Magento Community
Magento Community
Magento Community
 
golles
Sr. Member
 
Total Posts:  210
Joined:  2008-01-15
 
Dustin - 21 May 2008 05:38 AM

With a combination of this robots.txt file and the product url solution here: http://www.magentocommerce.com/boards/viewthread/8185/, It seems to have minimized the biggest issues with duplicate content.

Possibly

The workaround in the thread you quote is very good but could prove a nightmare as future revisions are released.

It would be better to have a fix in the core.

To me this is the only thing stopping us migrating all of our ecommerce stores to magento (we think it is that much of an issue) the rest of the system is spot on.

 
Magento Community Magento Community
Magento Community
Magento Community
 
Moshe
Magento Team
 
Avatar
Total Posts:  1771
Joined:  2007-08-07
Los Angeles
 

@golles: what do you think about this?

http://www.getelastic.com/are-color-product-pages-duplicate-content/

 Signature 

- I would love to change the world, but they won’t give me the source code -

 
Magento Community Magento Community
Magento Community
Magento Community
 
golles
Sr. Member
 
Total Posts:  210
Joined:  2008-01-15
 
Moshe - 28 May 2008 09:55 AM

@golles: what do you think about this?

http://www.getelastic.com/are-color-product-pages-duplicate-content/

That is a great article and brings up some great points. But the core problem is with a standard magento implementation that urls are created multiple times for each product.

So any one product can have multiple urls for the core product (not even mentioning the variations such as color or size)

To quote myself (from the first post in this thread):

A product can have:

www.domain.co.uk/product.html
www.domain.co.uk/category/product.html

and if the product is in multiple categories:

www.domain.co.uk/category1/product.html
www.domain.co.uk/category2/product.html

And then throw in the views by color, size and other things - it adds up to a significant issue for and serious SEO work

Admittedly, there are ways around this such as the robots.txt hack I defined above (which help SOME of the problem) and other code workarounds I have seen, but I worry about future Magento releases when using code workarounds - it would be a lot better to get the issue addressed in the core.

Simply (top of the head thinking) it needs each product to have one url so if that product is accessible via another category it is always re-written to the original url and then have the option (in admin maybe) to have a url for each variation / color or whatever. I guess you could even specify the product url in the admin too (let the store owner decide the url) - the key point is that however that url is accessed it is always rewrittent to the original url and no duplicate urls are generated.

Anyway I appreciate the response and pointer towards the blog post - it is interesting reading.

 
Magento Community Magento Community
Magento Community
Magento Community
 
Moshe
Magento Team
 
Avatar
Total Posts:  1771
Joined:  2007-08-07
Los Angeles
 

@golles: did you try Admin > System > Configuration > Catalog > Search Engine Optimizations > Use categories path for product URLs = No ?

 Signature 

- I would love to change the world, but they won’t give me the source code -

 
Magento Community Magento Community
Magento Community
Magento Community
 
golles
Sr. Member
 
Total Posts:  210
Joined:  2008-01-15
 
Moshe - 28 May 2008 11:49 AM

@golles: did you try Admin > System > Configuration > Catalog > Search Engine Optimizations > Use categories path for product URLs = No ?

Yes - partially but not fully

If you look through ALL (20,000 + results) of the pages on this search query at google you will see what i mean:

http://www.google.com/search?q=site:demo.magentocommerce.com&btnG;=Search&hl;=en&sa;=2

As I have pointed out before there are ways around the majority of the issues, but not all and I know the above is a demo store and not a live site, but it gives a good example of some of the issues.

 
Magento Community Magento Community
Magento Community
Magento Community
 
oldsteel68
Member
 
Total Posts:  57
Joined:  2008-04-22
 

is this even anywhere near being fixed? I can not use this software this way.

 
Magento Community Magento Community
Magento Community
Magento Community
 
AlexUser
Jr. Member
 
Avatar
Total Posts:  27
Joined:  2008-03-10
 

I totally agree with oldsteel68 on this ! Please lets get this fixed in the next release.

Alex

 
Magento Community Magento Community
Magento Community
Magento Community
 
adam777
Jr. Member
 
Total Posts:  30
Joined:  2008-05-08
 

Do we just make a robots.txt file and put it in the root directory, or is there an existing robots.txt file that I can modify (I still can’t find it)

 
Magento Community Magento Community
Magento Community
Magento Community
 
oldsteel68
Member
 
Total Posts:  57
Joined:  2008-04-22
 

Is anyone looking into this? Support is veeeeery quiet.....

 
Magento Community Magento Community
Magento Community
Magento Community
Magento Community
Magento Community
    Back to top
Page 1 of 3
 
Sales: Call 877.832.5289 (North America) 310.295.4144 (International)
© Copyright 2008 Varien. Magento, eCommerce software, is a trademark of Irubin Consulting Inc. DBA Varien
Privacy Policy|Terms of Service
Magento Community Count
53144 users|477 users currently online|107144 forum posts