Post

Python: Parsing a FIX Log in Eight Lines (List & Dictionary Comprehensions)

…A What?

A FIX (FInancial eXchange) protocol message consists of a set of tag/value assignment pairs separated by the ASCII 1 Start of Header (SOH) character.

Typically, a piece of software called a FIX Engine sits in front of an Order Management System (OMS), or similar, and handles FIX messages arriving inbound from the network and being sent outbound from the OMS. It takes care of things like parsing, sequencing, session state, retrying and syntax checking.

A FIX Engine will also write logs. These usually consist of each inbound and outbound FIX message written raw, one per line. Some FIX Engines will replace the SOH separator with another character in the log (but not always). This is usually | or a ‘pipe’, but can be anything. It makes the FIX messages more readable and stops text editors (and grep) complaining that the log file contains ‘binary’ data.

FIX logs are useful because they provide the raw form of a FIX message. These are the exact messages as they arrived inbound or the exact messages just before they’re about to be sent outbound. They’re a good indication of exactly what participants did (or didn’t!) receive or send.

Parsing the FIX Log

A fake FIX log extract is below. In this case, the SOH character has been replaced with |.

1
2
3
8=FIX.5.0SP2|9=142|35=V|34=1|49=SENDER|56=TARGET|52=20090323-15:40:29|264=0|265=0|262=1|263=1|268=5|269=0|269=1|269=b|269=c|269=B|146=5|55=1|55=2|55=3|55=4|55=5|10=062|
8=FIX.5.0SP2|9=129|35=D|34=1|49=SENDER|52=20221123-23:04:59.132|56=TARGET|11=11223344|21=1|38=123|60=20221123-23:04:59.132|40=1|54=1|55=123.HK|59=0|10=061|
8=FIX.5.0SP2|9=145|35=D|34=4|49=ABC_DEFG01|52=20090323-15:40:29|56=CCG|115=XYZ|11=NF 0542/03232009|54=1|38=100|55=CVS|40=1|59=0|47=A|60=20090323-15:40:29|21=1|207=N|10=139|

It’s possible to parse this FIX log in just eight lines of Python1 using list and dictionary comprehensions.

1
2
3
4
5
6
7
8
9
10
def get_fix_msg_as_dict(fix_msg):
    tags_and_vals = fix_msg[:-1].split('|') # Use '\0x01' if separator is SOH
    return {tag_and_val.split('=')[0]:tag_and_val.split('=')[1] for tag_and_val in tags_and_vals}

def parse_fix_log(file_path):
    with open(file_path, 'r') as file:
        fix_msg_list = [get_fix_msg_as_dict(line.rstrip()) for line in file]
    return fix_msg_list

fix_log = parse_fix_log('/path/to/fix/log')

The fix_log variable is now a list of dictionary objects, one dictionary object for each FIX message in the log.

Indexing on Tag Number

Now, it’s easy to pull specific tag values from messages.

1
2
3
4
5
6
7
# Get Sending Time Value
>>> msg.get("52")
'20090323-15:40:29'

# Get MsgTYpe 
>>> msg.get("35")
'V'

Note the use of the dictionary.get() instead of the [] operator. A lot of FIX tags are only valid for a subset of messages. If [] is used to try and index an element (or, in this case, a tag), that doesn’t exist in a message dictionary, Python will throw an error, whereas get() will just return None.

Let’s generate a CSV report from a subset of tags.

1
2
3
4
fix_log = parse_fix_log('/path/to/fix/log')
print('Send Time,Sequence Number,Message Type,Sender,Target')
for msg in fix_log:
    print(f'{msg.get("52")},{msg.get("34")},{msg.get("35")},{msg.get("49")}{msg.get("56")}')

Executed on the above example, this generates

1
2
3
4
Send Time,Sequence Number,Message Type,Sender,Target
20090323-15:40:29,1,V,SENDERTARGET
20221123-23:04:59.132,1,D,SENDERTARGET
20090323-15:40:29,4,D,ABC_DEFG01CCG

Filtering the Log

We can also filter our fix_log list for certain messages using list comprehensions. For instance, finding all the New Order Single messages:

1
2
3
new_order_singles = [msg for msg in fix_log if msg.get('35')=='D']
for msg in new_order_singles:
    print(msg)

What about all the New Order Singles for the CVS ticker?:

1
2
3
cvs_orders = [msg for msg in fix_log if (msg.get('35')=='D' and msg.get('55') == 'CVS')]
for msg in cvs_orders:
    print(msg)

Maybe I want to filter out all heartbeats:

1
2
3
business_msgs = [msg for msg in fix_log if msg.get('35') != '0']
for msg in business_msgs:
    print(msg)

Timestamps in the Log

Some FIX Engine logs put a timestamp at the beginning of each line followed by another separator. In the example extract below, they’ve even used the same separator as the one that replaces SOH!:

1
2
3
20090323-15:40:29|8=FIX.5.0SP2|9=142|35=V|34=1|49=SENDER|56=TARGET|52=20090323-15:40:29|264=0|265=0|262=1|263=1|268=5|269=0|269=1|269=b|269=c|269=B|146=5|55=1|55=2|55=3|55=4|55=5|10=062|
20090323-15:40:29|8=FIX.5.0SP2|9=145|35=D|34=4|49=ABC_DEFG01|52=20090323-15:40:29|56=CCG|115=XYZ|11=NF 0542/03232009|54=1|38=100|55=CVS|40=1|59=0|47=A|60=20090323-15:40:29|21=1|207=N|10=139|
20221123-23:04:59.132|8=FIX.5.0SP2|9=129|35=D|34=1|49=SENDER|52=20221123-23:04:59.132|56=TARGET|11=11223344|21=1|38=123|60=20221123-23:04:59.132|40=1|54=1|55=123.HK|59=0|10=061|

This is easily taken care of by splitting each line on the first occurrence of | as we send it to the get_fix_msg_as_dict() function.

1
2
3
4
def parse_fix_log(file_path):
    with open(file_path, 'r') as file:
        fix_msg_list = [get_fix_msg_as_dict(line.rstrip().split('|',1)[1]) for line in file]
    return fix_msg_list

Finally

I like this example. It shows how Python makes it easy to quickly parse crude data-sets into more refined data-structures so you can filter for what you want, all without needing extra libraries or imports. In a Live Service situation this kind of thing is invaluable.



  1. In fact, you could probably do it in less if you squished the two functions together, but I think this is more readable. 

This post is licensed under CC BY 4.0 by the author.