Python "int * list" operation: the illusion of safety
Today I solved a problem using dynamic programming: that means that I had to construct, initialize and fill a matrix, according to a recurrence formula.
The problem was, roughly speaking, "compute the minimal number of character substitutions needed to make a 'valid' string out of the input string".
My algorithm and code were sound, but when I executed the code, the answers I got were obviously incorrect. For all inputs, I got 0 as an answer. I knew, from trivial input cases, that the answer wasn't 0 for most of my test cases.
Happily enough, I had the reflex of printing the initial and final matrix. And what I saw was surprising: the initial matrix was wrong!!!
Here's the initialization code that I was using for the dp
matrix:
dp = (N+1)*[ 5*[N] ]
dp[0][0] = 0
where N
is a strictly positive integer, the length of the input string. From this initialization code, I expected a matrix where all values where equal to the value of N
, except for the value at entry [0][0]
. Please stop here and make your best guess about what I really got.
Got a guess? Well, the following Python session exhibits the problem:
>>> N = 2
>>> dp = (N+1)*[ 5*[N] ]
>>> dp
[[2, 2, 2, 2, 2], [2, 2, 2, 2, 2], [2, 2, 2, 2, 2]]
>>> dp[0][0] = 0
>>> dp
[[0, 2, 2, 2, 2], [0, 2, 2, 2, 2], [0, 2, 2, 2, 2]]
As you see, there are 0
s in the first entry of every row of the matrix!! What happened?
Well, at this point, you got it: the binary operator *
, which applies to an integer, say k
, on the left, and a list, say L
, on the right, creates a new list consisting of the k
-fold concatenation of L
, and this new list contains references to the same objects pointed-to by the entries of the original list.
As you see, the line
>>> dp[0][0] = 0
above is actually modifying the object pointed-to by dp[0]
, dp[1]
and dp[2]
, and that explains the strange behavior of my program.
Well, as the protip comes to its end, I'll state some conclusions. I'm really happy to have printed the data early in the debugging process. That shed much light on what was really happening.
Now, in what concerns the operator itself, my conclusion is the following: the 'int * list' operator must only be used when all the elements of the list are immutable, or at least read-only.
By the way, I finally used the line
dp = [ 5*[N] for _ in xrange(N+1) ]
dp[0][0] = 0
for initializing my matrix.
Written by Antonio Vera
Related protips
1 Response
When reading your tip I've remembered that recently used [ [0] * N ] * M
to create two-dimensional array in my project. It didn't blown up my code just thanks to serialization.
Thanks, you saved me hours of headache in near future!