Schneimi’s Dev Weblog


Fast random find with CakePHP

Posted in CakePHP by schneimi on August 28, 2009
Tags: , , ,

I want to share my experience with random finds in CakePHP because they can be very slow done the wrong way.

In this example we have the model Audioplaylist which hasAndBelongsToMany Audio and we want to get a random playlist of Audios. We assume the table for Audio is pretty large and we also need the data of other related models as result, let’s say the owner’s username and the title of the related album. All models are set to $recursive = -1 and the ContainableBehavior is used to contain the data.

  • The first approach would be to make one find and get all the data at once.
  • $audios = $this->Audioplaylist->Audio->find('all', array('contain' => array('User.name',
                                                                                'Album.title'),
                                                             'order' => 'RAND()',
                                                             'limit' => $count));
    

    Happy waiting!

    You have to know that DB queries ordered by RAND() get slower the more fields are selected, and in this case we even select all fields of the Audio model.

  • Knowing that, we can improve the find in using the fields parameter to also contain the fields of the Audio model.
  • $audios = $this->Audioplaylist->Audio->find('all', array('contain' => array('User.name',
                                                                                'Album.title'),
                                                             'fields' => 'id',
                                                             'order' => 'RAND()',
                                                             'limit' => $count));
    

    This is already a lot faster than before, but still not very applicable because it still slows down with the amount of related data.

  • So let’s try another approach that looks more circumstantial but avoids the problem seen before. The idea is to make two finds, the first one just finds some random Audio Ids and the second will then use the Ids to catch all the related data.
  • $randomAudioIds = $this->Audioplaylist->Audio->find('list', array('fields' => 'id',
                                                                      'order' => 'RAND()',
                                                                      'limit' => $count));
    
    $audios = $this->Audioplaylist->Audio->find('all', array('contain' => array('User.name',
                                                                                'Album.title'),
                                                             'conditions' => array('Audio.id' => $randomAudioIds),
                                                             'order' => 'RAND()'));
    

    Sometimes more is less, especially if you need alot of data from related models, you will appreciate the performance of these two queries.