Speedup Mysql and Webserver with Intel Compiler and tcmalloc

After reading some recent benchmark reguarding tcmalloc  performance on mysql. I decide to rebuild my whole webhosting stack with it.

ICC is intel’s c++ compiler, which has faster performance is also memtioned on mysql website.

Most distros should already has google performance tools prepackaged. Installation of ICC is slightly more complicated, you can download it directly from intel’s website which is free download for non-commercial use. Archlinux and Gentoo both have packaged installer. On ubuntu/debian system you probably also need build-essential and apt-build to rebuilt packages. On archlinux you will need base-devel and abs.

For most packages, the following bash script can be used before configuration/make step.  Don’t ommit the dot on first line and change the path of iccvars.sh to your installation directory.

 . /opt/intel/Compiler/11.0/081/bin/iccvars.sh intel64
CC=icc
CFLAGS="-xHOST -O3 -no-prec-div "
LD=xild
AR=xiar
CXX=icpc
CXXFLAGS="-xHOST -O3 -no-prec-div "
LDFLAGS=-ltcmalloc_minimal
export CC CFLAGS LD AR CXX CXXFLAGS LDFLAGS

These setting seems safe for all packages. Here is a summary of package specific cflags setting.

 MysqlCherokeeNginxVarnishPHPMemcached
-staticNoNoNoNoN/ANo
-ipoNoNoYesNoN/AYes
LDFLAGS=-ltcmalloc_minimalYesYesYesYesYesYes
configure option–disable-shared –with-mysqld-libs=-ltcmalloc_minimalNoneNone–disable-jemallocFailed with ICCNone

This might disappoint you. But the rebuilt software stacks show no improvement whatsoever in my benchmark.

  • http://bigdbahead.com Matt

    Ahhhh, I am glad to see I am not the only one who has tried tcmalloc with mysql and did not see any performance gain. I wonder if its actually specific workloads where others are seeing the performance benefit… or potentially 32bit -vs- 64 bit architectures … a million factors I suppose.

  • Yejun

    @Matt
    It should be used in a high concurrent threaded environment and reduce long term memory fragment. A heavy load InnoDB should fit in that picture. But I only did some simple single client test. Reguarding 32bit vs 64bit, I believe 64bit is always faster under any normal circumstance.

Buffer