hi, we have 250k products and we want to create more than 200 different site(multi site) all stores will have same products but different domain and different design. Magento multi site function is good for us but you need to match every product and every site so that means 50m(min) record for tables like “catalog_product_website"(and some other tables) and also flat catalog and indexing will be a real problem.
if we use store and store view than userbase will be same and also flat catalog will be problem.
So my question, is there any way to make all these stores work like single store? Or is there any way to make it work with nice performance?
You can greatly increase performance with a full page cache solution. This will greatly reduce load on the database. Other than that, you will need a lot of hardware and ram for the DB.
You can go to a full page cache however we find multiple optimisations such as apc, memcache, php compiling, optimised index management, etc are more efficient, we have worked with terrabyte databases, as long as you install the db correctly with optimal settings millions of records are not a problem.
In relation to Magento, it is slow especially with thousands of products and above, we normally use AWS with Elasticache, RDS, Route53 and load balancing as you can upscale instantly and leave the infrastructure headache to Amazon. For high performance sites it is best to allocate a batch server to take the load off the web servers for index management, admin, etc.
I have a site that is running with 200+ websites/store views and 10,000+ products, So it can be done. The issue with magneto is not on the product count but the website and store count. The more websites and store views you add, the exponentially slow it will become, this is because of the way they process and save the “Configuration” settings and their indexes. (the system configuration cache store all the stores settings for every website and loads it all in memory.
Here is what we have done to work around this
First do not use the file system for cache or sessions, but you will have to use the built in cache or your page load times will be about 10 mins. Memcahced will somewhat work, but it was not fast enough for our traffic (10-30 hits per sec, 200+ stores remember).
Use Redis and the Redis extension by Collin Mollenhour,
https://github.com/colinmollenhour/Cm_Cache_Backend_Redis
Next you will have to ditch the apache server, it will not be able to handle the site and install Nginx and PHP-FPM, make sure that the webserver and PHP both have enough memory, 1gb min per thread should be enough (e.g if nginx has 8 worker processes you need 8 gb of ram), also make sure that PHP is set to 1gb min. (this is because the whole system config has to be loaded and stored in memory to load a page).
You will also want to have multiple web nodes with a SAN for shared storage (i.e. at least the media, var, and includes folder). A NFS share or a rsync script will not be fast enough to enough to handle any kind of load that 200+ site will have.
A big hurdle will be the database we had to switch to the latest version of Percona’s MySQL since we kept killing the regular mysql, we have it on a dual 6 core server with 48gb ram (make sure your innodb buffer settings are high enough) and we can still max out all the cores if 2 or 3 people are saving product on the admin at the same time and it take about 5 mins to save a product. Also, you will have to set the indexes to update on save because it takes hours and hours to reindex.
Installing a full page cache module will help with keeping the load on the server down, but if you make a lot of changes or product updates your cache will have to be flushed and you may not see much benefit. But I still do recommend full page cacheing , Varnish works nicely or check out Collin Mollenhour’s cache extension.
https://github.com/colinmollenhour/Cm_Diehard
On a side note the built in product importer will take way to long to be functional, (1 day to import 1000 products with 200+ sites)
So, you will want to check out the Magmi (Magento Mass Importer) project
http://sourceforge.net/projects/magmi/
Good Luck and let me know if you have any questions
Further to Ryan’s comment, we have built a streamlined custom solution (Pseudo Store) to handle this so that only one store is required under normal circumstances. The solution has one website/store/store view and dynamically changes the currency, country and language/locale based on parameters thereby eliminating the performance issues with multiple stores and large product sets. The features are as follows.
- Via a country selector changes the country, currency and language/locale for a single store
- Uses cookies to store user country preferences based on return visits
- Single store thereby increasing performance without requiring server or special software installations (nginx, san, full page cache, etc)
- Faster product load times due to reduced number of database records
- Shortened indexing time reducing the load on the database server
- Filters products so that users for that country only see available products to themselves, redirection to information page if not available in their region
- Single RSS data feed
- Single sitemap
- Avoids duplicate content penalties in Google
Restrictions
- If any products require language based url keys or descriptions then a separate store is required, this is beneficial for seo as the language specific store will be indexed separately
Next release
- Custom url per Pseudo Store domain, subdomain or subdirectory
You can see it in action at Quintessential Fashion (http://www.quintfashion.com), in this instance it is using country codes as the basis although this is not fixed.
The solution needs to be technically configured for each installation, it is $995. We understand it is a high price, however the return on investment will save this amount in a few months by having reduced server requirements and not having the problem of dealing with millions of records overloading the database.
We wanted to provide an update with some figures, we have moved a site with 4 websites/stores and 65,000 products to the new code with one website/store and the same number of products. We have a custom load script using product->save which previously loaded approximately 30 records per minute. With the single store method we are now seeing up to 3-4 times faster loading and have noticed increased frontend performance using file and php caching (no memcache or read database).
Full page cache does not need to be flushed after each product update if the store owner can tolerate data that can be a few hours stale. For example, for Full Page Cache the data can be set to expire after 3 days and something like the Full page crawler can be used to refresh the most critical pages every few hours.
Inwebservices: I really like your inventive solution, I just wish that our needs would let us implement something like that, but our website and stores views are not for different countries or currencies, but for different co-op members and locations. I will say that the best thing to do is to never run more then 1 website and store view, because the page load times increase exponentially with each extra site and store added.
Extendware: Yes you don’t need to flush cache after every admin change, and with 200+ sites we have lots of people making a lot of changes though out the day, so our cache doesn’t get flush but a few times a day. So, I am not trying to rip on full page cache, our site cannot run with out it, but saying that not flushing it often, is not going to allow someone to run multiple sties. We don’t flush often and we always have cache built for our sites (which can be really good if something breaks or if an index gets reindexed or corrupted), but with 200+ stores on our single install we also always have pages that aren’t cached. Here are a few of the short comings we have run into with cache (full page or other wise).
1st. The page has to be loaded in order for it to be saved to cache
With lots of stores and lots of products page load times can be incredibly slow (hence all the massive hardware requirements). Also it’s very difficult to hit every page on every site with a full page cache crawler (even with Google and Bing helping) before the next time the cache gets to be flushed, so we always have some uncached pages being delivered. This is why the hosting environment needs to be strong enough handle uncached pages and well as the full page crawler and all the web crawlers hitting the 200+ sites.
2nd The page has to be saved somewhere
This is still a problem that we have, on the weekends when no one is flushing the cache we almost hit the limit of our Redis’s cache storage (10GB) and that’s only storing the 1st level of Page Depth (i.e. no page parameters). Since Redis store data in ram we need to add more ram or set up another node for the Redis cluster
3rd The page has to be retrieved quickly
I tested a few different cache storage engines and Redis was the only one able to handle our server’s load (it was faster when used with enterprise’s full page cache then varnish was for us). But also the cache server has to be able to handle multiple web nodes connection to it and also be able to be clustered and backed up… (Redis does all this)
4th The page has to be refreshed when it’s stale
I just don’t like the idea of dumping all the cache just because 1 change has been made for 1 product on 1 site, I still haven solved this and maybe your module can do this I don’t know, But on the same note cache should not expire simple because it’s old it should only expire if something on the page has changed, and I see that you can change setting for the life of your cache in your module.
Magemoss: On a side note you had asked a question about indexing and I don’t think I fully answered it, the more store and sites you add the longer it will take to reindex and since this is done on the database by default the bigger and bigger the database server requirements will get. So, one thing that we have been testing is moving the indexes out of the database and into Solr, lucene or Sphinx, I was able to get Solr working but the performance is not where I want it yet. I won’t go into detail since there is already some forum posting on the subject, but it’s just one more thing to think about when trying to stick 200+ sites in Magento.
the one and only question in my head - WHY???
ok if you have 200 partners with magento shops, and they really want to sell your 250k products
here is your keywords:
mysql proxy, mysql replication, time to time sync and reload.
but if you want to create 200 shops - this is just a waste of resources and time and money… better to buy ads and develop 3-4 small load balanced servers with mysql proxy to replicated master-slave backend…
or may be i do not see full picture.
here is more keywords:
preorder, reserved, price valid at the time of purchase.
I will state this again, if you can avoid having multiple sites/stores, then yes by all means do so, but this is not normally possible.
Here are a few reasons to have multiple sites:
1. Your company has multiple locations which sell the same and different brands, have inventory and/or stock amounts
2. You have a Co-op with multiple stores and locations each with the same situation as above
3. You have a central warehouse for partners to drop ship from but each partner can choose which products to carry
4. You have a multiple language store views setup and you need to have the products description and name in the other languages
5. You need to manage promotions, static blocks, and/or pages across multiple websites/stores
Those are only a few, but I hope it’s gets you to realize why so many people have to run multi site setups.
On a side note the whole master/slave write/read setup also has it’s issues as well (like the search not working sometimes), and if you think that we can just setup 1 database and replicate only the products to other magento database then you must have a lot of hard drive space and ram, because when dealing with a large amount of products, the databases get rather larger 6+ gb and replicating that over and over to 200+ database would not be pratical. In contrast our multi-site magento database is only 20+ gb… But I guess you could shard out the databases and run 1 for products and then multiple ones for each site, but your still going to have management issues and lose the ability for individual sites to make product changes.
So in conclusion, there is no 1 solution to making magento multi-site work! (at least not that I have seen, and if someone has it please share it) The closest I have seen is using a virtual image on a cloud server environment, but even then there is a limit to how many stores you can add before it slows down too much and then the costs of using that much resources can be high. We had all ready passed that point before I started looking into it so I don’t know where that is, I just know that it was a lot slower then out current setup.
But maybe if I were to build a private cloud or a super computer.....
In response to Ryan’s points, we were passed details by InWeb. Nginx does the core functionality of a web server very well with low overhead whereas Apache does everything, if you only need a subset of the core then Nginx will provide a 5x performance boost to your site. Couple this with APC, single store etc, and you are in league with the best websites out there. We are running 50,000 product single Magento stores on an AWS micro instance web farm with Nginx, the results are impressive. Adding websites/stores views degrades performance substantially.
We have dealt with very large corporate applications, the key to Magento with potential high store numbers is to minimise the stores to website level or regions and then use software such as dynamic filters or other business process changes. Think of a pyramid separated in to three to four vertical levels, if you have 50 stores then this is the bottom of the pyramid and the most granular, the top is one store.
When dealing with high store counts and skus, you want to position your overall website at the highest point of the pyramid taking in to account cost, stability, performance, and return on investment, a cross between business needs and technical possibilities. These are business decisions and not technical, fixing it only technically will always cause problems as nothing stays stable for long, upgrades, new functionality required, maintenance, etc. Fixing it only with business process changes will cause lost opportunities and profitablity.
In the end, once you get in to multi-store with thousands of products there are three key options.
a) solve it technically with the best hardware, software, services and time
b) solve it with business process changes or reduced requirements, potentially reducin profitability although with reduced IT costs
c) solve it with a hybrid of the two, this falls somewhere in the middle of the pyramid and is equal art and science.
You can see this visibly in the included attachment, the further you are down the business or IT side, the less flexibility you have. The goal is to move as close hybrid given your business circumstances, requirements, timescales, costs and business/technical knowledge available. This process is the same for the smallest one person website to the largest multi-national companies, only the scale changes.
The case we are dealing with is much more complex than that given by Ryan.
We would like to provide a marketplace functionality where we anticipate about 10000 merchants, each wanting to have their own store within a mega main store.
Each vendor will be given a Magento website with a single store. We will have a huge catalog of 100K products and each vendor will attach products from this list to their website. Each vendor can set a vendor-specific title and description and also price.
We also have features where vendors would set discounts, etc for products attached to their website, a feature that magento provides at the website level.
We realized that having custom filtering would make us modify magento left right and center and decided to go with out of the box support for multi-site.
We plan to use SOLR to serve catalog listing and search. We will customize magento to allow vendors to set their own quantity for each product they sell. This is global by default.
Apart from that, APC, Redis, Nginx will all be applied as suggested in this thread.
Does anyone see any issue?
Update on our Multi-site Magento, We have switched over to MariaDB 5.5.25 for our database and reduced the CPU load (only 50% during the day) and increased admin performance, this is another total must do for multi sites. Unfortunately re-index time remains the same.
yashgt: Sorry, but 10,000 websites in Magento is not going to work. let me explain why. I think that the theoretical upper limit of Magento 1.7 is maybe 1,000 websites, maybe. Currently with 200 websites, running on major hardware, the admin is almost unusable, 3 mins to create products, 2 hours to process indexes, 5 mins to generate system cache and so on. The problem is that there are three major architectural flaws that are preventing magento from breaking the 1,000 site limit.
First, every-time Magento loads a web page it load the system configuration settings for all websites, this is why adding sites slows it down so much. As you add more sites the php data object gets bigger and bigger and php requires more and more memory per thread to run it. We currently require 1.5gb of ram per thread, and that’s for 235 sites, someone who is running 500 sites may need 3gb of ram per thread but 1,000 sites could be up to 6gb of ram per thread, and on a 16 core server running 16 php threads that’s 96gb of ram on 1 box, just for PHP. So, the solution is to make magento only load the settings it needs for that site, but you’ll have to also change how it creates, stores, and retrieves the system configuration’s caches. This would be a major code and architectural change, I had a co-worker who tried to fix this but he could not get it to work correctly.
Second, indexes, or more specifically the catalog URL rewrite index, here is the thing, you can turn off the catalog and product flat index, you can move the searches to solr or sphinx, but the catalog URL rewrite stays and it will grow slower exponentially in size with the more sites you add. With our 235 sites and 10,000 products a 2 hour reindex time is not that big of a deal, but if we had 500 sites it could take 4 hours and a 1000 sites could take 8 hours. So, if you change the index to update on save, then product save times also get progressively slower and slower. But I don’t know what an acceptable time to save a new product should be, 3 mins, 5 mins, 10 mins, 15 mins it’s hard to say. In the end storing all that data in 1 table just does not make sense, because it creates the potential of having billions and billions of records, which makes it slow.
Third, Magento only works on 1 master mysql (or compatible) server, this is the real hard limit, only having database 1 server means that if it needs more resources you have to get a bigger server and server only get so big. Switching to a Virtual machine in a cloud or running on a cluster my gain some performance, but even then there is a limit as to how much resources a VM can have. The real solution would be to put it on oracle or mssql and run a cluster, this would require creating the connect scripts to those servers (which is the easy part) and then rewriting all of the mysql install and upgrade scripts for the new server, (and don’t forget the modules mysql). I’ve gotten magento to connect to a mssql server but the rewrite of the scripts was a huge task that I don’t have the skill for.
If those three things could be resolved then I think that magento could one day do 10,000 websites, unless there is another bottle neck that I don’t know about.
Regarding issue #2:
When a product is first created, it would be tagged only to the Main site/store. In this case I assume only the rewrite entries specific to the site/store would be created.
When a seller(vendor) chooses to attach the product to his site/store, the entries for just that seller would be created.
In our case not all products are tagged to all websites.
issue #3:
As described at http://sree.cc/magento-ecommerce-tips/database-concepts-how-to-write-a-query, Magento allows different connection strings for write and read operations, though the default setting uses a single connection from etc/config.xml.
We can have a Master - Read Replica setup and direct all reads to teh replicas.
Please correct me if my understanding is incorrect regarding the above 2 cases.
yashgt: The time it takes to create a product is directly related to the amount of websites, well more specifically the size of the catalog url rewrite index, so even if you create a product and add it to 1 site or 1000 sites it takes the same amount of time. So the problem is, if you have 10,000 sites it will take way too many hours to save a product, even it’s it only for 1 site.
So I know your optimistic about the whole Master/Slave replication and that it will somehow resolve your database issue, but in practice it will not. The reason why is because you still have and can only have 1 master, and since the main load is due to writes you still have to have a massive DB server and hardware can only do so much. the other side of the issue is that Magento also high read loads, joins without indexes and full table scans the read DB server has to have the same high specs as well. Let me make sure you understand Scaling out the read loads (in a shared nothing architecture) does not speed things up, it only allows more read connections but you still have high write loads and once the server is maxed that’s it. The real solution is for ether clustering or sharding of the database so that way the giant tables can be processed in a timely manner. So, as I was saying earlier, only being able to have 1 Master (write) server is massive bottleneck.
There are other smaller issues, especially in the admin, I can’t image how big a dropdown menu with 10,000 websites would be or if it’s even possible to save a attribute with 10,000 sites, since with only 200 sites I receive slow script errors. Of course this can fixed by editing the admin theme, it’s just one more thing to think about…
We have 10000 websites and one store per website. If a product is linked to a website and if the website has 1000 stores, I agree, the URL rewrite table will have entries for each of the 1000 stores. But in our case, when a product is tagged to a website, it gets an entry just once. Every time it is tagged to a website, an additional entry just for the single store of the website is made. We have verified this in an experiment.
Regarding clustering, I guess MySQL NDB cluster does not support foreign keys. This means if a product is deleted, the associated attribute values from the EAV model are not deleted. This may have issues if Magento does not delete the associated data on its own.
We are considering a DB server with 68 GB RAM, 8CPUs of 3.25 x 1.1 GHz. This is teh Quadruple extra large DB instance from Amazon cloud. The Read replica will have the same configuration.
I may be missing something in my understanding detailed above.