Golang Rabin-karp Algorithm

在某個機會下稍稍研究了一下這個 Multiple Pattern 字串匹配演算法，而下面則是一些演算法的核心理解與實作紀錄。由於相關的 code 在網路上其實都可以搜尋到資料，所以這邊僅撰寫
理解演算法核心後自行練習的 Single Pattern 。

核心與步驟

計算出 sensitive word 的 hash value

Hash value 的 f(x):
以字符總數為進位單位，假設稱為 M，並計算 sensitive word
中各字碼 * M ^ (len -1) 的總和。

Ascii = 128
Golang Unicode = 16777619
e.g. str = “abc”，M = 128，Len(str) = 3
Hash value = 97 * M^2 + 98 * M +97
= 1601891

取出句子(text)中前面 Len 長度的字節，進行 hash 並與
sensitive word 的 hash 進行比較。
通常會以最短 sensitive word 的長度為依據。

匹配

代表此句中含有 sensitive word

不匹配
1. 減掉字首的 hash 值
2. hash * M + 下一個字節碼，再與之比對，值至比對完
成。
e.g. text = “cedabcfeap”，len = 3
所以將 “ced”進行 hash，最後會得出不匹配，
這時應將比對 “eda”，所以減去 c的 hash 值後乘上 M，
再加上 a 的字節碼，成為新的 hash 值。再將此 hash 值，
與 sensitive word 的 hash 值進行比對。

實作

package main

import "fmt"

const M = 128

func Hash(str string) rune {
    h := rune(0)
    for i := 0; i < len(str); i++ {
        h = (h*M + rune(str[i]))
    }
    return h
}

func Match(txt string, sensitive string) int {
    txt_len := len(txt)
    pattern_len := len(sensitive)

    hash := Hash(txt[:pattern_len])
    sensitive_hash := Hash(sensitive)

    var pow rune = 1 // pow = M^(pattern_len-1)
    for i := 0; i < pattern_len-1; i++ {
        pow = (pow * M)
    }
    count:=0
    finall_index:=txt_len-pattern_len + 1
    for i := 0; i < finall_index ; i++ {
        if i > 0{
            hash = hash - pow*rune(txt[i-1])
            hash = hash * M + rune(txt[i+pattern_len-1])
        }

        if hash == sensitive_hash && pattern_len > 0 {
            count++
        }
    }
    return count
}

func main() {

    txt := "hello abc world,abc"
    sensitive := "abc"
    result := Match(txt, sensitive)
    fmt.Println(result)
}

0 Comments

Daemon Studio

Golang Rabin-karp Algorithm

核心與步驟

實作

0 意見

About me

Follow Us

Labels

recent posts

Blog Archive

Categories

Daemon Studio

Golang Rabin-karp Algorithm

核心與步驟

實作

You Might Also Like

0 意見

About me

Follow Us

Labels

recent posts

Blog Archive

Categories