HsHyperEstraier-0.4: HyperEstraier binding for Haskell

Text.HyperEstraier.Database

Contents

Description

An interface to functions to manipulate databases.

Synopsis

Types

data Database

Database is an opaque object representing a HyperEstraier database.

data EstError

EstError represents an error occured on various operations.

Constructors

InvalidArgument

An argument passed to the function was invalid.

AccessForbidden

The operation is forbidden.

LockFailure

Failed to lock the database.

DatabaseProblem

The database has a problem.

IOProblem

An I/O operation failed.

NoSuchItem

An object you specified does not exist.

MiscError

Errors for other reasons.

data AttrIndexType

AttrIndexType represents an index type for an attribute.

Constructors

SeqIndex

Map from a document ID to an attribute value. This type of index increses the efficiency of, say, getDocAttr.

StrIndex

Map from an attribute value to a document ID. This increases the search speed when you search for documents by an attribute value.

NumIndex

This is similar to StrIndex but for attributes whose value is a number.

data OptimizeOption

OptimizeOption is an option for the optimizeDatabase action.

Constructors

NoPurge

Omit the process which purges garbages of removed documents.

NoDBOptimize

Omit the process which optimizes the database file.

data RemoveOption

RemoveOption is an option for the mergeDatabase action and the removeDocument action.

Constructors

CleaningRemove

Clean up the region in the database where the removed documents were placed.

data PutOption

PutOption is an option for the putDocument action.

Constructors

CleaningPut

If the new document overwrites an old one, clean up the region in the database where the old document were placed.

WeightStatically

Statically apply the "@weight" attribute of the document.

data GetOption

GetOption is an option for the getDocument action.

Constructors

NoAttributes

Don't retrieve the attributes of the document.

NoText

Don't retrieve the body of the document.

NoKeywords

Don't retrieve the keywords of the document.

data OpenMode

OpenMode represents how to open a database.

Constructors

Reader [ReaderOption]

Open the database with read-only mode. You can specify ReaderOption to modify the behavior of the database.

Writer [WriterOption]

Open the database with writable mode. You can specify WriterOption to modify the behavior of the database.

Instances

data ReaderOption

ReaderOption is an option for the Reader constructor.

Constructors

ReadLock LockingMode

Specify how to lock the database.

data WriterOption

WriterOption is an option for the Writer constructor.

Constructors

Create [CreateOption]

Create a database if an old one doesn't exist. You can specify CreateOption to modify the behavior of the database.

Truncate [CreateOption]

Always create a new database even if an old one already exists. You can specify CreateOption to modify the behavior of the database.

WriteLock LockingMode

Specify how to lock the database.

data LockingMode

LockingMode represents how to lock the database.

Constructors

NoLock

Do no exclusive access control at all. This option is very unsafe.

NonblockingLock

Do non-blocking lock. (The author of this module doesn't know what happens if this option is in effect. See the manual and the source code of HyperEstraier and QDBM.)

data CreateOption

CreateOption is an option for the Create constructor.

Constructors

Analysis AnalysisOption

Specify the word analysis method.

Index IndexTuning

Specify the prospective size of the database.

Score [ScoreOption]

Specify how to handle scores of the documents.

data AnalysisOption

AnalysisOption is an option for the Analysis constructor.

Constructors

PerfectNGram

Use the perfect N-gram analyzer.

CharCategory

Use the character category analyzer.

data IndexTuning

IndexTuning is an option for the Index constructor.

Constructors

Small

Predict the database will have less than 50,000 documents.

Large

Predict the database will have less than 300,000 documents.

Huge

Predict the database will have less than 1,000,000 documents.

Huge2

Predict the database will have less than 5,000,000 documents.

Huge3

Predict the database will have more than 10,000,000 documents.

data ScoreOption

ScoreOption is an option for the Score constructor.

Constructors

Nullified

Nullify anything about the score of documents.

StoredAsInt

Store the scores for documents into the database as 32-bit integer.

OnlyToBeStored

Store the scores for documents into the database but don't use them during the search operation.

Opening and closing databases

withDatabase :: FilePath -> OpenMode -> (Database -> IO a) -> IO a

withDatabase fpath mode f opens a database at fpath and compute f. When the action f finishes or throws an exception, the database will be closed automatically. If withDatabase fails to open the database, it throws an EstError. See openDatabase.

openDatabase :: FilePath -> OpenMode -> IO (Either EstError Database)

openDatabase fpath mode opens a database at fpath. If it succeeds it returns Right Database, otherwise it returns Left EstError.

The Database can be shared by multiple threads, but there is one important limitation in the current implementation of the HyperEstraier itself. /A single process can NOT open the same database twice simultaneously./ Such attempt results in AccessForbidden.

closeDatabase :: Database -> IO ()

closeDatabase db closes the database db. If the db has already been closed, this operation causes nothing.

Manipulating database

addAttrIndex :: Database -> Text -> AttrIndexType -> IO ()

addAttrIndex db attr idxType creates an index of type idxType for attribute attr into the database db.

flushDatabase :: Database -> Int -> IO ()

flushDatabase db numWords flushes at most numWords index words in the cache of the database db. If numWords <= 0 all the index words will be flushed.

syncDatabase :: Database -> IO ()

Synchronize a database to the disk.

optimizeDatabase :: Database -> [OptimizeOption] -> IO ()

Optimize a database.

mergeDatabase :: Database -> FilePath -> [RemoveOption] -> IO ()

mergeDatabase db fpath opts merges another database at fpath (source) to the db (destination). The flags of the two databases must be the same. If any documents in the source database have the same URI as the documents in the destination, those documents in the destination will be overwritten.

setCacheSize

Arguments

:: Database

The database.

-> Int

Maximum size of the index cache. (default: 64 MiB)

-> Int

Maximum records of cached attributes. (default: 8192 records)

-> Int

Maximum number of cached document text. (default: 1024 documents)

-> Int

Maximum number of the cached search results. (default: 256 records)

-> IO () 

Change the size of various caches of a database. Passing negative values leaves the old values unchanged.

Getting documents in and out

putDocument :: Database -> Document -> [PutOption] -> IO ()

Put a document into a database. The document must have an "@uri" attribute. If the database already has a document whose URI is the same as of the new document, the old one will be overwritten. See setURI and updateDocAttrs.

removeDocument :: Database -> DocumentID -> [RemoveOption] -> IO ()

Remove a document from a database.

updateDocAttrs :: Database -> Document -> IO ()

Update attributes of a document in a database. The document to be updated is determined by the document ID. It is an error to change the URI of the document to be the same as of one of existing documents. Note that the document body will not be updated. See putDocument.

getDocument :: Database -> DocumentID -> [GetOption] -> IO Document

Find a document in a database by an ID.

getDocAttr :: Database -> DocumentID -> Text -> IO (Maybe Text)

Get an attribute of a document in a database.

getDocURI :: Database -> DocumentID -> IO URI

Get the URI of a document in a database.

getDocIdByURI :: Database -> URI -> IO (Maybe DocumentID)

Find a document in a database by an URI and return its ID.

Statistics of databases

getDatabaseName :: Database -> IO Text

Get the name of a database.

getNumOfDocs :: Database -> IO Int

Get the number of documents in a database.

getNumOfWords :: Database -> IO Int

Get the number of words in a database.

getDatabaseSize :: Database -> IO Integer

Get the size of a database.

hasFatalError :: Database -> IO Bool

Return True iff the document has a fatal error.

Searching for documents

searchDatabase :: Database -> Condition -> IO [DocumentID]

Search for documents in a database by a condition.

searchDatabase' :: Database -> Condition -> IO ([DocumentID], [(Text, Int)])

Search for documents in a database by a condition. The second item of the resulting tuple is a map from each search words to the number of documents which are matched to the word.

metaSearch :: [Database] -> Condition -> IO [(Database, DocumentID)]

Search for documents in many databases at once.

metaSearch' :: [Database] -> Condition -> IO ([(Database, DocumentID)], [(Text, Int)])

Search for documents in many databases at once. The second item of the resulting tuple is a map from each search words to the number of documents which are matched to the word.

scanDocument :: Database -> Document -> Condition -> IO Bool

Check if a document matches to every phrases in a condition.

To be honest with you, the author of this binding doesn't really know what est_db_scan_doc() does. Its documentation is way too ambiguous across the board. Moreover, the names of symbols of the HyperEstraier are very badly named. Can you imagine what, say est_db_out_doc() does? How about the constant named ESTCONDSURE? The author got tired of examining the commentless source code over and over again to write this binding. Its functionality is awesome though...