For polish readers:
http://www.patrykwieczorek.pl/kontakt.php
Please help a bit, thanks.
Sunday, 16 February 2020
Saturday, 15 February 2020
How To Speed-up Python Edit Distance Algorithm 26 Times Using Cython
I had an idea for a Python project, but I need some speed and I've found out, that Cython is the best option; all the code could be compiled in a normal way, just some modules will be rewritten in Cython. How we go, let's take some real function (what I need, not a dummy example), edit distance, pure Python code:
Just for a start, making pyx extension file, recompiling it through Cython, and of course a test file(credits to mister sentdex, youtube):
Just recompiling pure Python code gives about three times faster code, not bad for a start, let's get some more C in Python, use more types hints, the essence of performance:
I gave to Cython a knowledge about a type of i and j, they are used a lot in the algorithm, the outcome of tests:
Not that bad, nearly six times faster, let's give more, parameters have types, also returning value:
Getting better, nine times faster:
Get rid of Python array, list comprehension, the downside is, I've assumed, that the word is shorter than fifty characters, this to be figured out later, though:), finally, for now:
The tests:
That's not bad, still, we can get rid of Python string, use C++ instead, or maybe vector, (Cython gives more than that, even including direct C++ modules to code) but, I think, as for a real easy refactoring, 26 times speedup is really nice.
I'm not putting whole code to GitHub, as usually do, because the final snippet is what is really needed, the rest you can figure out from sentdex tutorial, or any other, it's really easy, believe me. Thanks for reading.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| def edit_dist_dp(str1, str2, m, n):
store = [[0 for x in range(n + 1)] for x in range(m + 1)]
for i in range(m + 1):
for j in range(n + 1):
if i == 0:
store[i][j] = j
elif j == 0:
store[i][j] = i
elif str1[i - 1] == str2[j - 1]:
store[i][j] = store[i - 1][j - 1]
else:
store[i][j] = 1 + min(store[i][j - 1], store[i - 1][j],
store[i - 1][j - 1])
return store[m][n]
|
Just for a start, making pyx extension file, recompiling it through Cython, and of course a test file(credits to mister sentdex, youtube):
1
2
3
4
5
6
7
8
9
10
| import timeit
py = timeit.timeit('edit_distance.edit_dist_dp("asdsdhter", "dsfladrte", 9, 9)',
setup='import edit_distance', number=10000)
cy = timeit.timeit('edit_distance_cy.edit_dist_dp("asdsdhter", "dsfladrte", 9, 9)',
setup='import edit_distance_cy', number=10000)
print(f"Python and Cython times: {py}, Cython: {cy}")
print(f"Cython is {py/cy} x faster than Python")
|
Just recompiling pure Python code gives about three times faster code, not bad for a start, let's get some more C in Python, use more types hints, the essence of performance:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| def edit_dist_dp(str1, str2, m, n):
store = [[0 for x in range(n + 1)] for x in range(m + 1)]
cdef int i, j
for i in range(m + 1):
for j in range(n + 1):
if i == 0:
store[i][j] = j
elif j == 0:
store[i][j] = i
elif str1[i - 1] == str2[j - 1]:
store[i][j] = store[i - 1][j - 1]
else:
store[i][j] = 1 + min(store[i][j - 1], store[i - 1][j],
store[i - 1][j - 1])
return store[m][n]
|
I gave to Cython a knowledge about a type of i and j, they are used a lot in the algorithm, the outcome of tests:
1
2
| Python: 0.7150410660033231, Cython: 0.1197679779943428
Cython is 5.970219068381515 x faster than Python
|
Not that bad, nearly six times faster, let's give more, parameters have types, also returning value:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| cpdef int edit_dist_dp(str str1, str str2, int m, int n):
store = [[0 for x in range(n + 1)] for x in range(m + 1)]
cdef int i, j
for i in range(m + 1):
for j in range(n + 1):
if i == 0:
store[i][j] = j
elif j == 0:
store[i][j] = i
elif str1[i - 1] == str2[j - 1]:
store[i][j] = store[i - 1][j - 1]
else:
store[i][j] = 1 + min(store[i][j - 1], store[i - 1][j],
store[i - 1][j - 1])
return store[m][n]
|
Getting better, nine times faster:
1
2
3
| (project_venv) $ python tests.py
Python: 0.7169225399993593, Cython: 0.07916731700242963
Cython is 9.055789271946113 x faster than Python
|
Get rid of Python array, list comprehension, the downside is, I've assumed, that the word is shorter than fifty characters, this to be figured out later, though:), finally, for now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | cpdef int edit_dist_dp(str str1, str str2, int m, int n): cdef int i, j, x, y cdef int store[50][50] for x in range(n + 1): for y in range(m + 1): store[x][y] = 0 for i in range(m + 1): for j in range(n + 1): if i == 0: store[i][j] = j elif j == 0: store[i][j] = i elif str1[i - 1] == str2[j - 1]: store[i][j] = store[i - 1][j - 1] else: store[i][j] = 1 + min(store[i][j - 1], store[i - 1][j], store[i - 1][j - 1]) return store[m][n] |
The tests:
1 2 3 | (project_venv) $ python tests.py
Python: 0.7155497479980113, Cython: 0.027351742995961104
Cython is 26.16102922960606 x faster than Python
|
That's not bad, still, we can get rid of Python string, use C++ instead, or maybe vector, (Cython gives more than that, even including direct C++ modules to code) but, I think, as for a real easy refactoring, 26 times speedup is really nice.
I'm not putting whole code to GitHub, as usually do, because the final snippet is what is really needed, the rest you can figure out from sentdex tutorial, or any other, it's really easy, believe me. Thanks for reading.
Subscribe to:
Posts (Atom)