Jumat, 08 Juni 2018

Sponsored Links

Indexing 2: inverted index - YouTube
src: i.ytimg.com

In computer science, the inverted index (also referred to as post file or reversed file ) is an index data structure that stores the mapping of the content, such as a word or numbers, to their location in the database file, or in a document or set of documents (called different from the forward index, which maps from document to content). The purpose of the reverse index is to allow full-text search quickly, with increased processing costs when the document is added to the database. The reversed file may be the database file itself, not the index. This is the most popular data structure used in document retrieval systems, which are used on a large scale for example in search engines. In addition, some mainframe-based database management systems with significant goals have used reverse list architecture, including ADABAS, DATACOM/DB, and Model 204.

There are two main variants of the inverted index: The inverted note (or reverse file index or just the reversed ) contains a list of references to the document for each words. The inverted word (or inverted index or reverse list ) index also contains the position of each word in the document. The latter form offers more functionality (such as phrase search), but requires more processing power and space to be created.


Video Inverted index



Apps

The structure of inverted index data is a major component of a typical search engine indexing algorithm. The purpose of search engine implementation is to optimize the query speed: find the document where the X word occurs. Once the advanced index is developed, which stores a list of words per document, then reversed to develop an upside index. Requesting an advanced index will require successive iterations through each document and each word to verify matching documents. Time, memory, and processing resources to perform such requests are not necessarily technically realistic. Instead of listing the words per document in the forward index, an inverted index data structure is developed that contains a list of documents per word.

With the upside index created, the query can now be completed by jumping to the word ID (via random access) in the reverse index.

In pre-computer times, concordance to important books was assembled manually. This is an effective inverse index with a small number of accompanying comments that require a large amount of effort to generate.

Source of the article : Wikipedia

Comments
0 Comments