The Knuth-Morris-Pratt (KMP) algorithm is a string matching algorithm that efficiently finds occurrences of a pattern within a text. It utilizes the concept of a "failure function" to avoid unnecessary character comparisons during string matching. Here's the pseudocode for the KMP algorithm:
KMPSearch(pattern, text):
n = length(text)
m = length(pattern)
lps = ComputeLPS(pattern)
occurrences = []
i = 0 // index for text[]
j = 0 // index for pattern[]
while i < n do
if pattern[j] = text[i] then
i = i + 1
j = j + 1
if j = m then
occurrences.append(i - j)
j = lps[j-1]
else if j > 0 then
j = lps[j-1]
else
i = i + 1
return occurrences
ComputeLPS(pattern):
m = length(pattern)
lps = array of size m
lps[0] = 0
len = 0 // length of the previous longest prefix suffix
i = 1
while i < m do
if pattern[i] = pattern[len] then
len = len + 1
lps[i] = len
i = i + 1
else
if len > 0 then
len = lps[len-1]
else
lps[i] = 0
i = i + 1
return lps
Let's go through the pseudocode step by step:
1. `n` represents the length of the text, and `m` represents the length of the pattern.
2. The `ComputeLPS` function constructs the "failure function" or "longest proper prefix which is also a suffix" array, denoted as `lps`, for the given pattern. It iterates over the pattern and updates `lps` based on the previous values.
3. The `KMPSearch` function performs the actual string matching using the KMP algorithm. It initializes indices `i` and `j` to traverse the text and pattern, respectively.
4. While `i` is less than the length of the text, it checks if the current characters `pattern[j]` and `text[i]` match. If they do, it increments both indices.
5. If `j` becomes equal to the length of the pattern, a match is found. The starting index of the match (`i - j`) is added to the `occurrences` list. Additionally, it updates `j` to the value of `lps[j-1]` to continue the search for further matches.
6. If the characters do not match and `j` is greater than zero, it updates `j` to the value of `lps[j-1]`. This step effectively skips unnecessary comparisons by utilizing the information from the `lps` array.
7. If the characters do not match and `j` is already zero, it increments `i` to continue the search.
8. Finally, the `occurrences` list is returned, containing the starting indices of pattern matches in the text.
Silan Software is one of the India's leading provider of offline & online training for Java, Python, AI (Machine Learning, Deep Learning), Data Science, Software Development & many more emerging Technologies.
We provide Academic Training || Industrial Training || Corporate Training || Internship || Java || Python || AI using Python || Data Science etc