Knuth Morris Pratt string matching algorithm

The Knuth-Morris-Pratt (KMP) algorithm is a string matching algorithm that efficiently finds occurrences of a pattern within a text. It utilizes the concept of a "failure function" to avoid unnecessary character comparisons during string matching. Here's the pseudocode for the KMP algorithm:

KMPSearch(pattern, text):
    n = length(text)
    m = length(pattern)
    lps = ComputeLPS(pattern)
    occurrences = []

    i = 0  // index for text[]
    j = 0  // index for pattern[]

    while i < n do
        if pattern[j] = text[i] then
            i = i + 1
            j = j + 1

            if j = m then
                occurrences.append(i - j)
                j = lps[j-1]

        else if j > 0 then
            j = lps[j-1]
        else
            i = i + 1

    return occurrences

ComputeLPS(pattern):
    m = length(pattern)
    lps = array of size m
    lps[0] = 0
    len = 0  // length of the previous longest prefix suffix

    i = 1
    while i < m do
        if pattern[i] = pattern[len] then
            len = len + 1
            lps[i] = len
            i = i + 1
        else
            if len > 0 then
                len = lps[len-1]
            else
                lps[i] = 0
                i = i + 1

    return lps

Let's go through the pseudocode step by step:

1. `n` represents the length of the text, and `m` represents the length of the pattern.

2. The `ComputeLPS` function constructs the "failure function" or "longest proper prefix which is also a suffix" array, denoted as `lps`, for the given pattern. It iterates over the pattern and updates `lps` based on the previous values.

3. The `KMPSearch` function performs the actual string matching using the KMP algorithm. It initializes indices `i` and `j` to traverse the text and pattern, respectively.

4. While `i` is less than the length of the text, it checks if the current characters `pattern[j]` and `text[i]` match. If they do, it increments both indices.

5. If `j` becomes equal to the length of the pattern, a match is found. The starting index of the match (`i - j`) is added to the `occurrences` list. Additionally, it updates `j` to the value of `lps[j-1]` to continue the search for further matches.

6. If the characters do not match and `j` is greater than zero, it updates `j` to the value of `lps[j-1]`. This step effectively skips unnecessary comparisons by utilizing the information from the `lps` array.

7. If the characters do not match and `j` is already zero, it increments `i` to continue the search.

8. Finally, the `occurrences` list is returned, containing the starting indices of pattern matches in the text.



About the Author



Silan Software is one of the India's leading provider of offline & online training for Java, Python, AI (Machine Learning, Deep Learning), Data Science, Software Development & many more emerging Technologies.

We provide Academic Training || Industrial Training || Corporate Training || Internship || Java || Python || AI using Python || Data Science etc





 PreviousNext