PyLucene 4.0 (in 60 seconds) tutorial

pylucene's extensive documentation — pylucene’s extensive documentation

As I’ve recently had the joy of struggling with using PyLucene (after many years), I re-entered the void of documentation straight after actually managing to compile and install the thing. I ended up at the five year old blog post “PyLucene 3.0 in 60 seconds — tutorial sample code for the 3.0 API” by Joseph Turian (that conveniently lets one infer syntax and functionalities of PyLucene) many times while googling. However, the example code no longer works, as in pylucene 4.0 some things changed, in particular;

Starting with version 4.0, pylucene changed from a flat to nested namespace, mirroring the java hierarchy. ~ source

I am running PyLucene 4.10.1, so I find whatever I need in the 4.10.1 Javadocs. Below is the PyLucene 3.0 in 60 seconds blogpost example updated for PyLucene 4.0 (and beyond…?), which I figured may be of use to those that start to dabble in PyLucene. Many thanks to Joseph for the original post!

Indexer

import sys
import lucene

from java.io import File
from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.document import Document, Field
from org.apache.lucene.index import IndexWriter, IndexWriterConfig
from org.apache.lucene.store import SimpleFSDirectory
from org.apache.lucene.util import Version

if __name__ == "__main__":
  lucene.initVM()
  indexDir = SimpleFSDirectory(File("index/"))
  writerConfig = IndexWriterConfig(Version.LUCENE_4_10_1, StandardAnalyzer())
  writer = IndexWriter(indexDir, writerConfig)

  print "%d docs in index" % writer.numDocs()
  print "Reading lines from sys.stdin..."
  for n, l in enumerate(sys.stdin):
    doc = Document()
    doc.add(Field("text", l, Field.Store.YES, Field.Index.ANALYZED))
    writer.addDocument(doc)
  print "Indexed %d lines from stdin (%d docs in index)" % (n, writer.numDocs())
  print "Closing index of %d docs..." % writer.numDocs()
  writer.close()

Retriever

import sys
import lucene

from java.io import File
from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.document import Document, Field
from org.apache.lucene.search import IndexSearcher
from org.apache.lucene.index import IndexReader
from org.apache.lucene.queryparser.classic import QueryParser
from org.apache.lucene.store import SimpleFSDirectory
from org.apache.lucene.util import Version

if __name__ == "__main__":
    lucene.initVM()
    analyzer = StandardAnalyzer(Version.LUCENE_4_10_1)
    reader = IndexReader.open(SimpleFSDirectory(File("index/")))
    searcher = IndexSearcher(reader)

    query = QueryParser(Version.LUCENE_4_10_1, "text", analyzer).parse("Find this sentence please")
    MAX = 1000
    hits = searcher.search(query, MAX)

    print "Found %d document(s) that matched query '%s':" % (hits.totalHits, query)
    for hit in hits.scoreDocs:
        print hit.score, hit.doc, hit.toString()
        doc = searcher.doc(hit.doc)
        print doc.get("text").encode("utf-8")

7 thoughts on “PyLucene 4.0 (in 60 seconds) tutorial”

Thank you for the blog.

However, I can only find the latest version of pylucene as 4.9.0. May I know where did you download 4.10?

dvdgrs says:

June 11, 2015 at 15:26

Yep!
e.g.: http://ftp.nluug.nl/internet/apache/lucene/pylucene/pylucene-4.10.1-1-src.tar.gz (happened to be my closest mirror — I see not all have 4.10, indeed…)

Reply

Thanks for your prompt reply. I found it too!

Thank you again!

Thank you,
Your code has saved me lots of time as i really dont find much documentation about pylucene.

soheil says:

January 17, 2016 at 09:52

In indexer part i faced with this error

Traceback (most recent call last):
File “/home/ahoora/myProjects/python/index.py”, line 23, in
doc.add(Field(“text”, l, Field.Store.YES, Field.Index.ANALYZED));
TypeError: descriptor ‘add’ requires a ‘Document’ object but received a ‘Field’

can you help me
i used python 2.7 and ubuntu

Thanks

Reply

Hello ,
I want to install PyLucene version 6.2.0, can u tell me how ? I’m so confuse
I’m using Python 3.5 in windows
Please help me

Hello there ,
I’m trying to create a simple application for information retreival
using python (3.5) and PyLucene (6.2.0)
Can u tell me how to install PyLucene (6.2.0) in windows
thank u

Ray says:

June 11, 2015 at 15:22

Thank you for the blog.

However, I can only find the latest version of pylucene as 4.9.0. May I know where did you download 4.10?

1. dvdgrs says:
  
  June 11, 2015 at 15:26
  
  Yep!
  e.g.: http://ftp.nluug.nl/internet/apache/lucene/pylucene/pylucene-4.10.1-1-src.tar.gz (happened to be my closest mirror — I see not all have 4.10, indeed…)
  
Ray says:

June 11, 2015 at 16:29

Thanks for your prompt reply. I found it too!

Thank you again!

Deepan Prabhu Babu says:

November 13, 2015 at 04:14

Thank you,
Your code has saved me lots of time as i really dont find much documentation about pylucene.

1. soheil says:
  
  January 17, 2016 at 09:52
  
  In indexer part i faced with this error
  
  Traceback (most recent call last):
  File “/home/ahoora/myProjects/python/index.py”, line 23, in
  doc.add(Field(“text”, l, Field.Store.YES, Field.Index.ANALYZED));
  TypeError: descriptor ‘add’ requires a ‘Document’ object but received a ‘Field’
  
  can you help me
  i used python 2.7 and ubuntu
  
  Thanks
  
Asma BHs says:

December 18, 2016 at 20:50

Hello ,
I want to install PyLucene version 6.2.0, can u tell me how ? I’m so confuse
I’m using Python 3.5 in windows
Please help me

Asma BHs says:

December 18, 2016 at 20:54

Hello there ,
I’m trying to create a simple application for information retreival
using python (3.5) and PyLucene (6.2.0)
Can u tell me how to install PyLucene (6.2.0) in windows
thank u

Indexer

Retriever

Related

7 thoughts on “PyLucene 4.0 (in 60 seconds) tutorial”

Leave a ReplyCancel reply