My site has a lot of product, lot of customers, lot of categories, and lots of data...Oh, and 2 store fronts. I have two beefy machines running this system, one a webserver the other a data server. MySQL logs show that this query is constantly taking more then 10 seconds to run:
SELECT COUNT(DISTINCT entity_id) AS `count`, FLOOR((value*1)/1)+1 AS `range` FROM `catalogindex_price` AS p WHERE (entity_id in ( SELECT `e`.`entity_id` FROM `catalog_product_entity` AS `e` INNER JOIN `catalog_category_product` AS `_table_position` ON (_table_position.product_id=e.entity_id) AND (_table_position.category_id IN ('96', '123', '124', '117', '118', '119', '121', '122', '125', '127', '128', '129', '131', '132', '133', '134', '135', '149', '136', '151', '137', '138', '139', '152', '141', '156', '142', '143', '144', '145', '147', '140', '148', '130', '116')) INNER JOIN `catalog_product_website` AS `_table_website_id` ON (_table_website_id.product_id=e.entity_id) AND (_table_website_id.website_id=1) INNER JOIN `catalog_product_entity_int` AS `_table_status_default` ON (_table_status_default.entity_id = e.entity_id) AND (_table_status_default.attribute_id='69') AND _table_status_default.store_id=0 LEFT JOIN `catalog_product_entity_int` AS `_table_status` ON (_table_status.entity_id = e.entity_id) AND (_table_status.attribute_id='69') AND (_table_status.store_id='3') INNER JOIN `catalog_product_entity_int` AS `_table_visibility_default` ON (_table_visibility_default.entity_id = e.entity_id) AND (_table_visibility_default.attribute_id='74') AND _table_visibility_default.store_id=0 LEFT JOIN `catalog_product_entity_int` AS `_table_visibility` ON (_table_visibility.entity_id = e.entity_id) AND (_table_visibility.attribute_id='74') AND (_table_visibility.store_id='3') INNER JOIN `catalog_category_product` AS `_table_category_129` ON (_table_category_129.product_id=e.entity_id) AND (_table_category_129.category_id IN ('129')) WHERE (e.entity_type_id = '4') AND (IFNULL(_table_status.value, _table_status_default.value) in (1)) AND (IFNULL(_table_visibility.value, _table_visibility_default.value) in (2, 4)) ) ) AND (store_id = '3') AND (attribute_id = '49') AND (customer_group_id = 0) GROUP BY `range`;
First off there is over half a million rows in this table, and this query as above does a full scan/sort on 250k of those rows. Very sloppy and shouldn’t be run often if it’s required…
Anyway, I added an index to CATLAOGINDEX_PRICE table that reduced the rows it full scans in half (still very bad but probably the best this is going to do as it doesn’t restrict the query on entity_id and so there are 120k rows matching the main WHERE clause [eg the last 3 in the above sql]). Added an index that contained these three rows in this order: `attribute_id`, `store_id`, `customer_group_id`
So what does this SQL do? How important is it? Can it be put into a summary table or cached or?
The code that generates this sql is actually in app/code/core/Mage/CatalogIndex/Model/Mysql4/Price.php and it’s used for the pricing filters while browsing categories. Which isn’t much problem if you don’t have a ton of products, but we do. For now I hacked it to stop doing the query (and those turning off the price filtering) by adding this line:
if ($attribute->getAttributeCode() == ‘price’) { return array(); }
right at the top of the function:
public function getCount($range, $attribute, $entityIdsFilter)
While in there add the same thing to the top of the function:
public function getMaxValue($attribute = null, $entityIdsFilter = array())