|
|
Auth |
What to say? Well, I saw Scott's new house today. Very impressive. Very large. Lots of neat touches, like a fireplace between the master bedroom and the tub of the master bath. :) Kinda small closets though.
I spent a lot of time today working with the low level C Bloomberg API and trying to pinpoint the problems I'm having. I'm sure they've got bugs, but I have to come up with a very simple C program that demonstrates the problem, and I'm doing all my work with a complex Java program. And to top it off, it looks like the problem is more my fault than I expected.
I price everything in two to three passes, trying to get the same price two out of three times. At first, I was getting about 17 failures per run - pretty bad. (A failure is where, all three times the price comes back different.) Then I noticed some funny log messages - I saw some failures before the second pass finished (and the third pass doesn't start until the second pass is complete). It turns out I had a bug in the new algorithm which could cause the sequence of queries for a single instrument to bifurcate. The underlying object didn't realize this, and it effectively caused the second and third passes to run at the same time, overwriting each other's state and generally causing havoc. Once I got that fixed, it was down to about four failures per run - a big improvement, but still useless.
I know the problem stems from the GetTimeSeries function. If I don't call that function, everything works fine. I know from previous work that GetTimeSeries can return packets that are sometimes corrupt at the end. I was working around that by ignoring any errors from the last field of the packet. I could have sworn I saw strong evidence that suggested these corrupt GetTimeSeries packets were producing framing errors in other packets as well. For like five stocks in a row, the prices would all be wrong - instead, they'd have the price of the stock below them in the list. I'd never reproduced this in a small test case, so I figure it is a race condition and only happens under heavy load. That's going to be a royal pain to reproduce.
I started my investigation today with the easily reproducible packet corruption problem.
A GetTimeSeries response packet contains some number of ticks. The header says how many ticks are
in the packet. The corruption problem is that under certain circumstances (I'm not sure exactly
what those are, but one is requesting one time bar, which is 7 ticks) the last tick is missing.
On my initial runs, there were just zeros in the packet. (The bizarre thing is that the API
always requests a buffer that is bigger that the actual packet anyway!)
I also knew that the last tick
was not always zeros, but sometimes contained random data from an earlier packet. I don't know
if I ever determined if the garbage was written there by the API (and was thus leftovers from an
internal buffer) or if the garbage was there from the previous use of my buffer (since
I recycle the buffer). So, the
first thing I did was clear my buffer with 0xCCs before passing it to Bloomberg to
fill. It turned out that they were never writing to the buffer (rather than writing junk to it).
Fine, it's still a bug.
The old algorithm could safely ignore the last tick, as it was a field I wasn't interested in. It didn't occur to me that the new algorithm depends on every tick, which means garbage in the buffer that looks like a valid tick could really screw things up (and was doing so). If I clear the buffer first, it is very easy to identify the missing ticks and the possibililty of interpreting garbage as valid data does not exist.
I ran the whole thing with just that change. What do you know - it worked perfectly! No instruments required a third pass! Crap! You see, this is good news and bad news. The good news is, if it really is working, it's a great work around and I don't have to wait for a bug fix from Bloomberg before I can put the new algorithm into production. (And if I had confidence that it really fixed all the problems, I could do away with running multiple passes, since I wouldn't need to run a second pass to confirm the results of the first.) The bad news is, that change should not have any affect on the packet framing bug. Which means either 1) clearing the buffer really does fix the packet framing bug which means Bloomberg is doing something really stupid like reading data out of the client buffer to maintain their packet framing which means it's really going to be hard to reproduce, 2) I didn't run into the packet framing bug on that particular test run which means it's even more sporadic and hard to reproduce, or 3) they already fixed the packet framing bug but I never noticed because I had a work around in place. Regardless, it's going to be hard to pin down which of the three choices is correct, which I have to do before I can confidently say the bug is resolved.
And to top it all off, I finally got a hold of someone at Bloomberg who will look at the bug, but I'm going on vacation. Oh well, I'm sure they won't complain too much if it takes me a while before I give them more grief about their bugs.
| Louis K. Thomas <louisth@hotmail.com> | Auth | 2004-08-05 (1579 days ago) |