Monday, June 25, 2012

Sphinx Search Optimization

Sphinx search is an alternative of performing search in mysql .
From my experience sphinx will provide you a great deal of performance .
For Details http://sphinxsearch.com

This Page is about the optimization that i had done and will boost your performance

* Caching (Use Memcache)
sphinx directly does not support caching. An alternative is to use memcache. 
Based on your website requirements , say i fetch records of 1st 10 pages  and store records in memcache. So when user uses pagination then there is no need to peform a search operation , rather we can fetch records directly from memcache

* Use Of Multiquery / faceted searching
Multi-queries, or query batches, let you send multiple queries to Sphinx in one go.Two API methods that implement multi-query mechanism are AddQuery() and RunQueries(). 


Why To Use:
1. decrease in network roundtrips. 
2. automatic internal optimizations.
3. for new version performance will always increase.


Sequential Code
$cl->SetGroupBy('grouping1', SPH_GROUPBY_ATTR,"@count desc");
$cl->AddQuery ('textToSearch_OPTINAL','IndexName');
$res=$cl->RunQueries();

$cl->SetGroupBy('grouping2', SPH_GROUPBY_ATTR,"@count desc");
$cl->AddQuery ('textToSearch_OPTINAL','IndexName');
$res=$cl->RunQueries();

$cl->SetGroupBy('grouping3', SPH_GROUPBY_ATTR,"@count desc");
$cl->AddQuery ('textToSearch_OPTINAL','IndexName');
$res=$cl->RunQueries();

can be optimized by Multiquery
$cl->SetGroupBy('grouping1', SPH_GROUPBY_ATTR,"@count desc");
$cl->SetGroupBy('grouping2', SPH_GROUPBY_ATTR,"@count desc");
$cl->SetGroupBy('grouping3', SPH_GROUPBY_ATTR,"@count desc");
$cl->AddQuery ('textToSearch_OPTINAL','IndexName');
$res=$cl->RunQueries();

* Avoid use of setSelect('limitedcolumns)
setSelect(*) is much faster than setSelect('limitedcolumns)
Reason:
You have a bucket of colored marbels.If I ask you to pass me all (the *), then you just pass me the bucket.But if I ask you to pass me only the red and yellow ones, it takes you time, because you
have to pick out the right ones, and put them in a new bucket.
Smaller bucket (ie less network bandwidth), but takes longer (processing)

Sphinx Search

* Use Of SphinxQL (for large in/not-in queries)

client end computations are very high specially when a lot of values are being sent
in a IN or NOT IN clause because for every value it converts it into some binary format
using pack function and then sends to the sphinx server. This causes high load on our web
server because we have queries wherein we show online users within a search criteria and
sometimes online users are around 15k-20k.

example:  normal use of api is slow and cpu utilization consuming

for($i=100;$i<20000;$i++) {        $x[]=$i; }

$cl->SetFilter('PROFILEID',$x,TRUE);
use of sphinxQL will give cpu utilization improvement by 10 times

Distributed searching
To scale well, Sphinx has distributed searching capabilities. Distributed searching is useful to improve query latency (ie. search time) and throughput (ie. max queries/sec) in multi-server, multi-CPU or multi-core environments. This is essential for applications which need to search through huge amounts data (ie. billions of records and terabytes of text).

* Explore sphinx.conf
1. mem_limit                       = 32M (default allocation)
change it to around 2048MB based on your usage.

3 comments:

  1. Really love the memcache part

    ReplyDelete
  2. Nice explanation of setSelect('limitedcolumns')

    can u share more insights on sphinx.conf varaibles

    ReplyDelete
  3. Thank you for sharing valuable information. Nice post. I enjoyed reading this post. The whole blog is very nice found some good stuff and good information here Thanks..Also visit my page

    ReplyDelete