Last Updated: November 24, 2016
·
5.396K
· nikhiln

How to add full text search in Django?

Introduction:

We added a full text functionality in our one of our product recently using django haystack. We used whoosh as a search engine to test out things and I thought it might be helpful for others. Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Elasticsearch, Whoosh, Xapian, etc.) without having to modify your code.

Here are the steps to get started with it:

  1. Install django haystack using pip, e.g. pip install django-haystack
  2. Install whoosh using pip, e.g. pip install whoosh
  3. Add haystack to your INSTALLED_APPS.
  4. Create search_indexes.py files for your models.

More information on configuration can be found here at http://django-haystack.readthedocs.org/en/latest/tutorial.html#configuration

The search_index.py contains index class for Message model (on which search is required).

from haystack import indexes


class Message_forumIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.EdgeNgramField(document=True, use_template=True)
    message = indexes.CharField(model_attr='message', null=True)
    forum_id = indexes.IntegerField(null=True)
    status = indexes.IntegerField(model_attr='status', null=True)
    tags = indexes.CharField(model_attr='tags', null=True)
    message_date = indexes.DateTimeField(null=True)
    message_thread_id = indexes.IntegerField(null=True)

    def get_model(self):
        return Message_forum

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

    def prepare_forum_id(self, obj):
        return obj.forum.id

    def prepare_status(self, obj):
        return obj.status

    def prepare_message_date(self, obj):
        return obj.message.date

    def prepare_tags(self, obj):
        return [tag.tag for tag in obj.tags.all()]

    def prepare_message_thread_id(self, obj):
        if obj.message.thread is not None:
            return obj.message.thread.id
        else:
        return -1

Haystack is supporting full text based search. For that we can define one of field in index class with document=true. This allows us to use a data template to build the document the search engine will use in searching. I have created a template named 'messageforumtext.txt' with below content.

{{ object.forum.id }}
{{ object.tags.tag }}
{{ object.message.date }}
{{ object.message_thread_id }}

As a final step, we need to build index data. For fresh start one can run ./manage.py rebuild_index command, it will build fresh index data which would be latter use by haystack for searching purpose. Ideally we should cron up a ./manage.py update_index job at specific interval (using --age=<num_hours> reduces the number of things to update). Alternatively, for low traffic application, the RealtimeSignalProcessor can be used which will automatically handles updates/deletes for you. I have cron update_index command for our application.

How to do search ?
We can define either search django templates where we can do search. But in our case our frontend is on different technology stack, so I needed to create a separate function which does search and return data by providing various query criteria. Small snippet is given below:

#narrow down search to set of specific forums only..
results = SearchQuerySet().filter(SQ(forum_id__in=['1','2'))   

#filtering messages for selected status..
filts = []
selected_status = "completed, draft".split(",")

       if len(selected_status) > 0:
                status_cond = SQ()

               #appending status filters
                if len(selected_status) > 0:    
                    status_cond |= SQ(status__in=selected_status)
                filts.append(status_cond)

#you can add more filters based on your need here

#applying filters
for filt in filts:
            results = results.filter(filt)

# Optionally you can add pagination support on results resultset object.

    paginator = Paginator(results, 10) 
    # Make sure page request is an int. If not, deliver first page.
    try:
        page = int(page)
    except ValueError:
        page = 1

    try:
        messages = paginator.page(page)
    except (EmptyPage, InvalidPage):
        messages = paginator.page(paginator.num_pages)

Hope this will you.