I still need to work on the VM insns, there's a lot of allocation taking place that shouldn't... also, there's range-checking on each and every vector access.. even including stuff like the register file. Given those caveats, it's only about 4X slower than a similar python loop. I'm pretty sure I should be able to make it faster than python, we'll see.
--- cps --- 0 = new_env [] 1 - push_env [0] None 1 = close [] 'loop_0' 0 = lit [] 3 1 = varref [] ((0, 0), False, {n_1}) 0 = primop [0, 1] ('%=',) 0 = test [0] None 0 = lit [] 1 return [0] None 0 = varref [] ((0, 0), False, {n_1}) 1 = lit [] 2 0 = primop [0, 1] ('%-',) tr_call [0] (1,) return [0] None - store_tuple [1, 0] (0, 1, 1) 0 = new_env [] 1 1 = lit [] 0 0 = store_tuple [1, 0] (0, 1, 1) 1 = varref [] ((0, 0), True, {loop_0}) invoke_tail [1, 0]
Ok, after trimming some of the allocations and turning off the range checks I'm at 1.8X. Stay tuned...
ReplyDeleteOk, after upgrading to gcc-4.5.0, and with the allocations trimmed (somewhat, more to come), the range check removal, I get a speed of 0.72 compared to a similar python loop. I can stop for the night.
ReplyDelete