I still need to work on the VM insns, there's a lot of allocation taking place that shouldn't... also, there's range-checking on each and every vector access.. even including stuff like the register file. Given those caveats, it's only about 4X slower than a similar python loop. I'm pretty sure I should be able to make it faster than python, we'll see.
--- cps ---
0 = new_env [] 1
- push_env [0] None
1 = close [] 'loop_0'
0 = lit [] 3
1 = varref [] ((0, 0), False, {n_1})
0 = primop [0, 1] ('%=',)
0 = test [0] None
0 = lit [] 1
return [0] None
0 = varref [] ((0, 0), False, {n_1})
1 = lit [] 2
0 = primop [0, 1] ('%-',)
tr_call [0] (1, )
return [0] None
- store_tuple [1, 0] (0, 1, 1)
0 = new_env [] 1
1 = lit [] 0
0 = store_tuple [1, 0] (0, 1, 1)
1 = varref [] ((0, 0), True, {loop_0})
invoke_tail [1, 0]

Ok, after trimming some of the allocations and turning off the range checks I'm at 1.8X. Stay tuned...
ReplyDeleteOk, after upgrading to gcc-4.5.0, and with the allocations trimmed (somewhat, more to come), the range check removal, I get a speed of 0.72 compared to a similar python loop. I can stop for the night.
ReplyDelete