BitFunnel

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
BitFunnel
DeveloperMicrosoft
Initial release2016; 10 years ago (2016)
Repositorygithub.com/BitFunnel
Written inC++
Engine
    Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
    PlatformWindows, macOS, Ubuntu
    TypeSearch engine indexing algorithm
    LicenseMIT License
    Websitebitfunnel.org

    BitFunnel is the search engine indexing algorithm and a set of components used in the Bing search engine,[1] which were made open source in 2016.[2] BitFunnel uses bit-sliced signatures instead of an inverted index in an attempt to reduce operations cost.[3]

    History

    [edit | edit source]

    Progress on the implementation of BitFunnel was made public in early 2016, with the expectation that there would be a usable implementation later that year.[4] In September 2016, the source code was made available via GitHub.[5] A paper discussing the BitFunnel algorithm and implementation was released as through the Special Interest Group on Information Retrieval of the Association for Computing Machinery in 2017 and won the Best Paper Award.[3][6]

    Components

    [edit | edit source]

    BitFunnel consists of three major components:[1]

    • BitFunnel – the text search/retrieval system itself
    • WorkBench – a tool for preparing text for use in BitFunnel
    • NativeJIT – a software component that takes expressions that use C data structures and transforms them into highly optimized assembly code

    Algorithm

    [edit | edit source]

    Initial problem and solution overview

    [edit | edit source]

    The BitFunnel paper describes the "matching problem", which occurs when an algorithm must identify documents through the usage of keywords. The goal of the problem is to identify a set of matches given a corpus to search and a query of keyword terms to match against. This problem is commonly solved through inverted indexes, where each searchable item is maintained with a map of keywords.[3]

    In contrast, BitFunnel represents each searchable item through a signature. A signature is a sequence of bits which describe a Bloom filter of the searchable terms in a given searchable item. The bloom filter is constructed through hashing through several bit positions.[3]

    Theoretical implementation of bit-string signatures

    [edit | edit source]

    The signature of a document (D) can be described as the logical-or of its term signatures:

    SD=tDSt

    Similarly, a query for a document (Q) can be defined as a union:

    SQ=tQSt

    Additionally, a document D is a member of the set M' when the following condition is satisfied:

    SQSD=SQ

    This knowledge is then combined to produce a formula where M' is identified by documents which match the query signature:

    M={DCSQSD=SQ}

    These steps and their proofs are discussed in the 2017 paper.[3]

    Pseudocode for bit-string signatures

    [edit | edit source]

    This algorithm is described in the 2017 paper.[3] M=𝚏𝚘𝚛𝚎𝚊𝚌𝚑 DC 𝚍𝚘𝚒𝚏 SDSQ=SQ 𝚝𝚑𝚎𝚗M=M{D}𝚎𝚗𝚍𝚒𝚏𝚎𝚗𝚍𝚏𝚘𝚛

    References

    [edit | edit source]
    1. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    3. ^ a b c d e f Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    [edit | edit source]