Friday, September 7, 2007


I was under the impression that riddles were out of style for interviews. I was wrong. Don't get me wrong, I love a good brain teaser- but as a criteria for hiring?
My favorite are the "personal probing riddles"
I actually got this today

q:"if you were stuck on a desert island what 3 items would you bring with you"
me:"ok. a boat, a sat phone. and a tank of gas"
me:"what would you take?"
q:"well, generally people say their favorite book or something"
me:"that wont get you off the island."
q:"no, it wouldn't.
me:"next question"

Tuesday, September 4, 2007

the times, they are a changing

So, the dust has settled... mostly.
The world has calmed... mostly.
And the market has stabalized- er, well at least it's not tanking- er, at least not as bad as a couple of weeks ago.

And now is the time for change.

At my firm, it began with mortgages (pronounced "morgue-ages"). Then onto CDO. I get that. Then onto repo. OK. And now, it has began to spread everywhere. My firm was not a trading firm- purely brokerage. Riskless. That was until my division came around. As a whole, I think we did ok, but the scare has changed things. Books are flattening, and you can taste change in the air.

And so, after 2 years, and my best month of trading ever (seriously- it was ridiculous. 100% win rate one day. 122mm traded. 100%! Come on) I find my self looking for another job, which is a real shame because I loved my job. I loved my boss. And I loved that the solutions to the problems we were solving were always just out of our reach- which kept us reaching. I loved my team, small and fast, the way all things should be. Most of all I loved my freedom- it was a lab first- a trading desk second. And although at the time I sweated the actual trading- looking back it was good for me personally.

I'm not afraid of change, I was always ready to leave if required. I've had the number for a truck driving school as my desktop for almost 2 months. Change is good. It means a new challenge, new problems, and new people/friends. And so on that note, if you're looking for either

a. A professional surfer for your southern California team. Or...
b. A new statistical arbitrage group. Or...
c. A kdb+ programmer (I've decided not to work in another language except if absolutely necessary)

Then drop me a line.

Friday, July 20, 2007

Fridays (again)

Yup. Worst day ever. Just further reinforcement that coming in on Friday's is bad for me, bad for business. All over by 10AM.

Thursday, July 19, 2007


I've always said (and I picked it up from somewhere) that for every 9 you want to add to your uptime, multiply your initial cost by 9.
So if you have a 10mm USD data center, and you want to go from 99 to 99.9% uptime, you need a 90mm data center infrastructure. Why? Well lets take a look

You need back up generators, back up network lines, which means changing the data center architecture to support all that. Redundant gas lines for the generators. Failover for power sources etc (would have been cheaper to build it like that in the first place)-8mm
You need double the hardware, and multiple components in all the hardware you have (dual nic etc)- 7mm
You need a DR data center in another location, with the same setup- 15mm.
You need a new high speed line for the center links, and a new line for the DR site (different clec)- 2mm
You need to upgrade the SAN with real time LUN level replication, and buy one for the DR site - 5mm
You need clustering, on everything (oracle,sybase, windows, unix, linux ...), custom coded apps need to be re-written for active failover- 25mm (includes services)
You need the buy staff to deal with active/active failover, and 24x7 operation- 4mm
You need load balancing and/or fail over network operations for inbound and outbound connections (data feeds etc)- 10mm (at least!)
Since you have 2 different physical locations now (hopefully not the same state) you need new services contracts- 1mm

So thats only 78mm more, or 8x the original cost. But I'm sure I could spend the extra 12mm on something I've forgotten.

The best DRS I've ever seen was at the Depart of the Navy. Everything was a virtual machine. Live snapshots of the VM's was taken every hour or so. The snapshots were saved to an EMC SAN- which had real time replication to 5 other locations. All locals replicated to all remotes. Every remote site had a small cluster of "failover machines". The network was designed by Cisco and everything could automagically routed wherever. So, the entire data center in VA gets blown up (or whatever). The alarm fires, the VM's are started at the primary failover site (NC), they come online, routers do their thing (the DoD has the benefit of their own network) and wallah. Magic data center moved. Worst case loss, 1 hour. Failover time for the entire data center- 5-10 minutes (AND no application restart- they are hot snapshot loads). beautiful.

So, having built a couple of data centers in my day, and coded many an application for active failover, and having deployed clustering on every version of windows since NT4 and Red hat- when a data center is down for say 7 hours, I think people should not only be fired, but any contractors should be sued. I'm not saying who's data center went down, but let's just say it was bad.
Oh, kdb+ failover is trivial. Everything I do is in a pub/sub model. So aside from an extra machine in my data center- I push everything to my desktop. So when the lights went out, I still knew my positions.

Wednesday, July 18, 2007

BSC- sorry about that

Sorry, we lost all the money. Well- 90% of it, but the other 10% is ours.
In other news, we have some free flights to Florida for you- and all you need to do is attend a 2 hour presentation on the wonderful oppertunity for real estate in the greater Miami area.

Wednesday, July 11, 2007


I've tossed this around to a bunch of people, so I thought I'd toss it out here and maybe someone would know an answer:

Given 3 series, A B and C
Solve for the next value of each series subject to the constraints:

correlation of A&B>=.9
correlation of B&C>=.9
correlation of A&C>=.85

deviation of A is <=.051
deviation of B is <=.051
deviation of C is <=.051

There are multiple correct answers, I'm looking for as many as possible as quickly as possible.

Here's some sample data

2.47 3.453 4.263
2.476 3.405 4.211
2.484 3.429 4.228
2.46 3.377 4.178
2.395 3.309 4.119
2.387 3.298 4.115
2.46 3.394 4.215
2.582 3.494 4.296
2.591 3.508 4.293
2.55 3.456 4.24
2.469 3.363 4.161
2.485 3.411 4.197
2.485 3.404 4.188
2.469 3.36 4.137
2.436 3.343 4.126
2.478 3.377 4.166

Amending to a Matrix

Jamie did an excellent job on explaining this and I thought I share it (cleaning out old emails)

A way of amending multiple points in a square matrix in one go.
Take the following matrix:

q)4 4#0
(0 0 0 0;0 0 0 0;0 0 0 0;0 0 0 0)

Let's say we want to add 1 to the diagonal (make it the identity matrix). We could get ourselves a list of coordinates (0 0;1 1;2 2;3 3) and do it one at a time, using over to pass the previous result forward each time:

q).[;;+;1]/[4 4#0;(0 0;1 1;2 2;3 3)]
(1 0 0 0;0 1 0 0;0 0 1 0;0 0 0 1)

This would get pretty slow if the list of coordinates is large, since they are essentially scalar operations. So we can do as Arthur has done, flatten the matrix and use a bit of code to map the 2 dimensional coordinate to it's one dimensional equivalent:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q)4 sv flip (0 0;1 1;2 2;3 3)
0 5 10 15
q)@[(4*4)#0;4 sv flip (0 0;1 1;2 2;3 3);+;1]
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

The cut command gives us back the matrix from the flat vector:

q)4 cut @[(4*4)#0;4 sv flip (0 0;1 1;2 2;3 3);+;1]
(1 0 0 0;0 1 0 0;0 0 1 0;0 0 0 1)

Tuesday, July 10, 2007

Data Mining in kdb+

I'm revisting the "billion query code" because many people have sent questions in.
Mining in Practice
In general, if you can brute force all the solutions to a given search space (or phase space as my physics friends and wife say) - thats what you want to do. If you can't, time for a heuristic.
What do I mean by space? Well lets consider an example:

K- Maximum Sum Subarray Problem

We have a table (arrayable) T

t:ungroup flip (`$/:.Q.a)!enlist each (26 0N#(26*n)?5)*(26 0N#(26*n)?-1 1)

Which looks like
q)show flip t
a| 0 0 4 -2 -3 3 0 -4 1 -1
b| 0 3 1 -4 -3 2 4 3 0 -3
c| 4 -3 0 -1 -2 0 -3 -4 1 0
z| -1 2 0 3 -2 2 1 -3 0 2

Now I ask you to find the maximum sum of the column z, using any combination of the other variables. E.g. let's sort by column a

We can see that if we use the range a>=0&a<=5 then we have

q)show flip `a xasc t
a| -4 -3 -2 -1 0 0 0 1 3 4
b| 3 -3 -4 -3 0 3 4 0 2 1
c| -4 -2 -1 0 4 -3 -3 1 0 0
z| -3 -2 3 2 -1 2 1 0 2 0

What if we add the condition b>=3? Then the intersection is...

q)show flip `a xasc t
a| -4 -3 -2 -1 0 0 0 1 3 4
b| 3 -3 -4 -3 0 3 4 0 2 1
c| -4 -2 -1 0 4 -3 -3 1 0 0
z| -3 -2 3 2 -1 2 1 0 2 0 //the -1 falls out

Ok now what's the best we can do? How hard is this problem? If we consider all 3 dimensional solutions (e.g. using 1 a, 1 b and 1 d) the problem has */26 26 26 5 5 5 solutions (about 2 million). But in real life we have lots more values (more than 5) and generally more variables.

Breaking down the problem:
A good place to start is to reduce the dimensionality by bucketing values. Consider placing value in m uniform buckets. The code below does this- and quickly.
Then you can do the search, this is ludicrously fast in q (arthur code of course).


/running totals in 2 dims
/ 3d aggrs

\t u:m .q.xrank't f
\t r3:{u f3[x]\:/:u}'u

Friday, July 6, 2007

Stay home Friday's

yesterday and the day before the holiday were great days for me. I trade a pattern recognition system (which is just jargon for "matched patterns built from data mining to streaming data"). Most of the time the system trades as normal- never that big of a position- and life is fine. I win, I lose, but overall I'm up with a very small draw down. But the market has been wicked recently- like a skittish, rabid, cocaine addicted rabbit.

So today I walk in, knowing it's going to be a bad day. Why? Well first of all it seems every time I trade on Friday (my day off) I lose- as if to reinforce the fact that I should have stayed home. But also after 2 big winning days- I was in for a loss.
So the day goes like this

5 AM- wake up/news
5:30 Out the door
5:45 At the office
6:00 Turn on trading system (start q process), check last night's data, get coffee
8:00 (or about) start trading
8:30 lose $7500 on one trade- ok
8:45 lose another 3k- still ok
9:00 down another 10k mark to market- that sucks
9:30 risk management and other stuff begins triggering (q is awesome)
10:00 system fights back from down 20k to down only 11
11:00 flat (down only 1k)
12:00 market grinds down, no trades. At this point I contemplate going home.
12:00:00.001 get long 10 mm
12:10 get long another 10 mm (why not)
12:30 %*^*&*@!!!
13:30 more %*^*&*@!!!
14:00 give up, risk management rolls out of positions, end the day down 30k.

Wednesday, June 13, 2007

The devil is in the details

getting real-time data from a financial exchange

ensuring that data is correct

Harder Still:
interacting with the exchange and sending markets/orders cancels etc.

Harder Than That:
maintaining a book with offsetting positions

Really Really Really Hard:
dealing with partial fills for new positions and/or exits

Approaching Impossible:
crossing those positions internally to save a trip and the spread, including partial filling, waiting for confirmation that outstanding markets

Worthy of Arthur:
doing all that, without loops, and a subsytem to minimize paying the spread via fancy bid-> agress conversion logic.

E.g. imagine you are long 5@100, looking for the market to goto 110$ in the next hour. Then you get short 10 @101, expecting the market to goto 99 in the next 30 minutes, in 30 minutes the market is 100- what do you do?
Lets say you cross internally. You take the 5 longs off the book, so you cancel the offers for those 5 longs- but wait- while thats happening the market goes to 107/109.

Now instead of 2 positions imagine 100, and instead of 1 price per position- imagine N distinct prices- and a market that jumps all over the place.

Monday, June 11, 2007

Some useful financial functions

thought these might be handy to a few people. I can't take credit for most:

drawdown:{[x]v:u?max u:(maxs x)-x;(u v;x?x[v]+u v;v)};

This returns a vector, the drawdown value, the index of the start and end of that period.

ema:{[n;x]b:1-a:2%n+1;c:(sum n#x)%n;((n-1)#0n),c,c{[a;b;x;y](a*y)+b*x}[a;b]\n _x};

Smooth Moving Average
smavg:{[n;x]((n-1)#0n),i,{(z+(x-1)*y)%x}[n]\[i:avg n#x;n _ x:0^x]};

Max Consecutive Losses
MaxConsectiveLosers:(max count each "0"vs raze string pl<0)

Convert Tics to Float and visa versa

f2tic:{[x]a:x - floor x;b:a%(1%32);c:floor(a -(floor b)%32)%(1%256);if[c=4;c:"+"];raze raze string(floor x),"-",string floor b, string floor c};

tic2f:{("I"$n#x)+((8*"I"$x n+1 2)+"0123+567"?x 3+n:x?"-")%256.}

Friday, May 25, 2007

Back testing in Q

I get this all the time- how do you backtest a strategy.
Well, here's a simple way. Every trade has 4 components- it's entry time, it's profit objective (ge: good exit)) it's stop limit (se- stop exit) and a time exit (te).
The code below will, given a table with a column "entry" which is boolean and a price will backtest the strategy (this is a version for going long). I like to back test against quotes, but this is for prices (trades).

i_eb:where t`entry; //where are the entry indicies
i_te:(count t)^((t`time) bin/: ((t@i_eb)`time)+tep*1000); //find the indicies for the time exits
rng:{x+key floor (y-x)}'[i_eb;i_te]; //define the ranges
f_u:{[xe;limit;p;x]limit&x+xe>p x};f_l:{[xe;limit;p;x]limit&x+xe<p x}; //functions for upper and lower limits
e_pr:(t`entryprice)i_eb; //define the entry prices
i_ge:f_u[(e_pr+gep);i_te;t`bid1]/[i_eb]; //indices for the good exits
i_se:f_l[(e_pr-sep);i_te;t`bid1]/[i_eb]; //indices for the stop exits
i_xe:min each v:(count t)^flip (i_te;i_ge;i_se); //define the exit action
x_ty:(`te`ge`se)@/:i_x:first each iasc each v; //define the exit types
x_pr:(t`bid1) i_xe; //the exit price is the bid at the exit index
x_pl:x_pr-(e_pr); //the exit pnl is the exit price- entry price

An example

t:`time xasc flip `time`price`entry`entryprice!(n?`time$.z.Z;n?10;n?01b;n?10)
btL[t;2;1;30] //go for 2$,risk 1$, hold for 30 seconds

1 billion queries

Prior to a life in finance, I had a brief stint in root cause analysis for diseases for the Department of the Navy (NMIC).
They problem was always the same- a ton of variables and some measure of sickness. For example you might have weight, age, blood pressure etc and on the right hand side you have a weighted value of sickness (visits, stage of disease etc). If I had q and arthur back then, I think we could have cured cancer. The code below does 1 billion queries in a couple of seconds, finding all 3 variable combinations. For example if the variables are `a`b`c, and the weighted death value is D then an example output is

if a>2,b<3,d>5 then the sum of D is 2345

il:`b`c`d`e //variable list il (independent list)

k)u:m .q.xrank't il
k)r3:{u f3[x]\:/:u}':u


Complex Event Processing- is overly complex.
I don't go to trade shows anymore- or talk to vendors. This reminds me of the first large installations of networks in the 90's (WAN). The idea's were simple- and so was the base technology- but in order to make a buck vendors made the whole process so complicated and full of jargon that no-one knew what the hell was going on.


Event processing is this

kdb+tic-> subscriber
alert on subscriber-> push out
chain another subscriber.

How hard is that?

kdb+ as the infrastructure

We use kdb+ as our sole technology infrastructure. Some people find that odd. What? No java? No C#? No Tibco? And then the inevitable.. "How do you..." So I thoughts I'd anwser some common questions.

How do you?

Q... Build GUI's
A: In general we don't. We do use flash (adobe flex) to connect to kdb+ via q's built in XML parser- but that's only for show. We use the kdb+Excel link for real-time information. But in general we have few GUI's.

Q... Handle Risk Management
A: kdb+tic is all you need for an event driven system. Risk management is just an event based decision process. Requested positions come in, they are checked against a bank of risk management rules, and either they are rejected or passed to the execution engine.

Q... Execute trades
A: The feedhandler works both ways. It can both capture data from the exchange and send it. This ensures that you are not making decisions you can't trade on.

Q...Why kdb+
A: Speed and simplicity. Our entire trading system- which is data capture, back testing, real time screening, risk management, execution and position management is about 8 pages and very very fast.

Send in some questions and Ill post them here