wukong documentation¶

wukong offers an ORM query engine for Solr and Solr Cloud.

Get Started¶

Define your document class¶

from wukong.models import SolrDoc

Class YourDocClass(SolrDoc):
    solr_hosts = "localhost:8080"
    collection_name = "my_solr_collection"

Fetch documents¶

docs = YourDocClass.documents.filter(firstname__eq="james").all()

Update documents¶

docs[0].firstname = "Jim"
docs[0].index() # single update

docs[1].firstname = "Smith"
docs.index() # batch update

Delete documents¶

docs[0].delete() # single delete
docs.delete() # batch delete

Create documents¶

YourDocClass.documents.create(id=1, firstname__eq="james", lastname__eq="bond")

Documentations¶

Solr Document Class¶

In order to connect to your Solr collection, we can just extend a base class called SolrDoc. You can specify four attributes in your class.

solr_hosts: the host name(s) for your Solr servers.

zookeeper_hosts: the host name(s) for your Zookeeper hosts which monitor Solr. (optional)

collection_name: the collection name for your collection in Solr.

request_timeout: in how many seconds to drop the request to Solr. (optional, defaults to 15)

For example, if you have a collection named User, you can do the following.

from wukong.models import SolrDoc

Class User(SolrDoc):
    solr_hosts = "localhost:8080"
    zookeeper_hosts = "localhost:2181"
    collection_name = "users"

def validate_schema_fields(self, fields):
    pass

def get_data_for_solr(self):
    pass

You can overide existing methods to fit your business logic, like validate_schema_fields, get_data_for_solr.

validate_schema_fields: return boolean to validate if the current document is consistent with the Solr Schema

get_data_for_solr: return a json format to send to Solr for indexing

If you have multiple collections, you can define a base class to define solr_hosts and zookeeper_hosts, and the subclasses to only specify the collection_name.

from wukong.models import SolrDoc

Class BaseDoc(SolrDoc):
    solr_hosts = "localhost:8080"
    zookeeper_hosts = "localhost:2181"

Class User(BaseDoc)
    collection_name = "users"

Class Car(BaseDoc)
    collection_name = "cars"

Documents Retrieval¶

Once you define your document class, you can use it to fetch documents in Solr.

Filtering¶

# fetch all documents whose name is james
User.documents.filter(name__eq="james").all()

# fetch all documents whose name is not james
User.documents.filter(name__ne="james").all()

# fetch all documents whose name has james as substring
User.documents.filter(name__wc="james").all()

# fetch all documents whose name doesn't have james as substring
User.documents.filter(name__nwc="james").all()

# fetch all documents whose age is greater to 30
User.documents.filter(age__g=30).all()

# fetch all documents whose age is less to 30
User.documents.filter(age__l=30).all()

# fetch all documents whose age is greater or equal to 30
User.documents.filter(age__ge=30).all()

# fetch all documents whose age is less or equal to 30
User.documents.filter(age__le=30).all()

# fetch all documents who lives in either in Ottawa or New York
User.documents.filter(city__in=['Ottawa', 'New York']).all()

# fetch all documents who lives in neither in Ottawa nor New York
User.documents.filter(city__nin=['Ottawa', 'New York']).all()

# fetch all documents whose has zip field
User.documents.filter(zip__ex=True).all()

# fetch all documents whose doesn't have zip field
User.documents.filter(zip__nex=True).all()

# fetch all documents whose age is less to 30 and live in Ottawa
User.documents.filter(age__l=30, city__eq="Ottawa").all()

# fetch all documents whose age is less to 30 or live in Ottawa
User.documents.filter(OR(age__l=30, city__eq="Ottawa")).all()

# fetch all documents whose age is less to 30 or live in Ottawa and also has zip field
User.documents.filter(AND(OR(age__l=30, city__eq="Ottawa"), zip__ex=True)).all()

# fetch all documents whose age is less to 30 and live in Ottawa
User.documents.filter(age__l=30).filter(city__eq="Ottawa").all()

Sorting¶

# fetch all documents sorted by age ascendingly
User.documents.sort_by('age').all()

# fetch all documents whose name is james sorted by age descendingly
User.documents.filter(name__eq="james").sort_by('-age').all()

Search¶

# fetch all documents matched `james bond` in the default field (usually `text`)
User.documents.search('james bond').all()

# fetch all documents matched `james bond` in name (weight 10) and city (weight 1)
User.documents.search('james bond', name=10, city=1).all()

# fetch all documents matched `james bond` in default field with at least 2 tokens matched
User.documents.search('james bond', minimin_matches=2).all()

Grouping¶

# group all documents by `gender` and fetch the groups
User.documents.group_by('gender').groups()

# group all documents by `gender` and `city` and fetch the groups
User.documents.group_by(['gender', 'city']).groups()

# group all documents by `gender` and `city` and get 3 documents in each group
User.documents.group_by(['gender', 'city'], group_limit=3).groups()

Faceting¶

# facet all documents by `gender` and fetch the facets
User.documents.facet_by('gender').facets()

# facet all documents by `gender` and `city` and fetch the facets
User.documents.facet_by(['gender', 'city']).facets()

# facet all documents by `gender` and `city` and fetch the facets at least having 10 docs
User.documents.facet_by(['gender', 'city'], mincount=10).groups()

Pagination¶

# paginate documents and get 100 documents starting from 200
User.documents.offset(200).limit(100).all()

Return Fields¶

# only fetch the fields (id and name) for each document
User.documents.only('id, 'name').all()

Raw Documents¶

# fetch all documents matched `james bond` and fetch a list of raw json rather than SolrDoc list
User.documents.search('james bond').raw()

Chained Query¶

# fetch the documents matching `james bond` and with age greater than 30, and get 100 documents starting from 200
User.documents.search('james bond').filter(age__g=30).offset(200).limit(100).all()

Document Creation¶

# Create a document in Solr
User.documents.create(id=12345, name="James Bond", city="London")

# Batch create within one request to Solr
docs = [
    User(id=12345, name="James Bond", city="London"),
    Entity(id=12346, name="Kate", city="New York")
    ...
]
docs = SolrDocs(docs)
docs.index()

Document Update¶

doc = User.documents.create(id=12345, name="James Bond", city="London")

# Update a document in Solr
doc.name = "Jim Bond"
doc.city = "Ottawa"
doc.index()

Document Delete¶

doc = User.documents.create(id=12345, name="James Bond", city="London")

# Update a document in Solr
doc.delete()

# Batch delete within one request to Solr
docs = [
    User(id=12345, name="James Bond", city="London"),
    Entity(id=12346, name="Kate", city="New York")
    ...
]
docs = SolrDocs(docs)
docs.delete()

Complex Query¶

# You can always use `User.solr.select` to build your custom query
User.solr.select({
    q: "it is complex",
    ...,
    ...,
    ...
})