CoinTossX: An open-source low-latency high-throughput matching engine

We deploy and demonstrate the CoinTossX low-latency, high-throughput, open-source matching engine with orders sent using the Julia and Python languages. We show how this can be deployed for small-scale local desk-top testing and discuss a larger scale, but local hosting, with multiple traded instruments managed concurrently and managed by multiple clients. We then demonstrate a cloud based deployment using Microsoft Azure, with large-scale industrial and simulation research use cases in mind. The system is exposed and interacted with via sockets using UDP SBE message protocols and can be monitored using a simple web browser interface using HTTP. We give examples showing how orders can be be sent to the system and market data feeds monitored using the Julia and Python languages. The system is developed in Java with orders submitted as binary encodings (SBE) via UDP protocols using the Aeron Media Driver as the low-latency, high throughput message transport. The system separates the order-generation and simulation environments e.g. agent-based model simulation, from the matching of orders, data-feeds and various modularised components of the order-book system. This ensures a more natural and realistic asynchronicity between events generating orders, and the events associated with order-book dynamics and market data-feeds. We promote the use of Julia as the preferred order submission and simulation environment.


Introduction
A complete study of the market microstructure of the Johannesburg Stock Exchange (JSE) is not possible without access to their matching engine 1 . Studying market microstructure is challenging due to the various changes in the market, regulation and technology. However, most of the current literature focuses on analyzing existing exchanges and the building of agent based models. The importance of order matching engines in the trading infrastructure makes these systems of interest not only to computer scientists but also to computational finance and risk management, while the non-linear impact of event-driven processes relating to order matching may provided an impenetrable calibration boundary for agent-based models attempting to empirically relate temporal dynamics with specific agent behaviours [1,2,3].
A trade matching engine is the core software and hardware component of an electronic exchange. It matches up bids and offers to complete trades. Electronic order matching was introduced in the early 1980s in the United 1 A matching engine is component of an exchange that matches buy and sell orders according to the rules of the exchange.
States to supplement open outcry trading 2 . Before this, stocks where traded on exchange floors and transaction costs where high. Failures in these systems increased as the frequency and volume on the electronic networks increased.
Modern matching engines are fully automated and use one or several algorithms to allocate trades among competing bids and offers at the same price. They typically support different order types and have unique APIs or use standard ones 3 . As it pertains to trading, latency 4 directly influences the amount of time it takes for a trader to interact with the market.
Traditionally, to achieve low latency, high-frequency trading has required powerful server hardware appropriately networked in a data center, scaled to accommodate worstcase network traffic scenarios on the busiest trading days. These trading systems must be resilient in the face of network or power failures, requiring expensive redundant hardware as well as offsite data retention [4]. For example, firms would use co-location, fibre-optic network cables, optimized hardware architectures and other technology to get as close to zero latency as possible [5]. On the other hand, cloud-based software solutions benefit by being resilient and offering cost-effective, easy scalingan advantage that is not offered by traditional trading systems. However, the biggest challenge for latency in a cloud-based environment, and one of the greatest barriers to building high-frequency trading systems in this environment, is the fact that hardware is not co-located within a data center [4]. That said, a combination of the low latency of traditional matching engines and the resiliency, scalability and availability of cloud-based environments is something that is yet to come about 5 .
A low latency high throughput 6 matching engine does not exist for academics to further their understanding in this field [6]. CoinTossX provides an environment for the application of agent-based modelling experiments which would otherwise be prohibitive to undertake in real financial markets due to cost, complexity and other factors. It was for this reason that CoinTossX was developed and why it's applications are studied further here. Here we 2 A method of communication between professionals on a stock exchange or futures exchange typically on a trading floor 3 Vendors include Connamara Systems, Cinnober (acquired by Nasdaq), Aquis Technologies (A2X), SIA S.p.A., Nasdaq, Match-Trade, MillenniumIT (JSE), GATElab Ltd (acquired by London Stock Exchange), Eurex, LIST, Stellar Trading Systems, Quodd, Baymarkets, Market Grid, ARQA Technologies, Kappsoft, Thesys Technologies 4 Latency refers to the ability of a system to handle data messages with minimal delay. 5 There is however a strong argument to place the Order Management System (OMS), that decides what to trade -the selection of parent orders and execution strategies, in the Cloud; while retraining the Execution Management System (EMS), that implements child orders, in close proximity to the matching engine. Here we are moving the matching engine into the Cloud. 6 High throughput refers to the ability of a system to process a very high volume of data messages.
hope to provide an easy-to-use, openly available, realistic, real-time, simulated trading system that is straightforward to set-up on multiple platforms. Hence, this paper benefits institutions, academics and others seeking to take advantage of an open-source, resilient, scalable matching engine simulator that may avoid a variety of conflicts of interests related research or insights derived from commercial alternatives. This paper is structured as follows: Section 2 and Section 3 outline the existing literature on the topic. Section 4 is dedicated to an outline of CoinTossX. More specifically, Section 4.1 gives a detailed description of the structure and software construction. Section 4.3 presents the capabilities and features of CoinTossX. Section 4. 4 gives the details of the extensive tests conducted to ensure the system meets the stringent latency and throughput requirements. Supplementary to the testing framework, Section 5 shows the results of the simulation of a simple Hawkes process for generating large volumes of market and limit orders. Section 6 ends with some concluding remarks and, lastly, Appendix A provides instruction for users to deploy and use the application on their local machine or on a remote server.

Market simulation for testing
Simulating the entire financial market ecosystem for trade strategy testing and risk management is appealing because of the mechanistic complexity of the market structure and costs and the nonlinear feed-backs and interactions that multiple interacting agents bring to the market ecosystem. This type of simulation can be both a financially costly, as well as computational expensive exercise; yet it appears tractable, and subsets of the ecosystem are used for system verification e.g. in vendor provided test market venues.
The rapid evolution of software and hardware and the increasing need to reduce transaction costs, system failures and transaction errors has led to important changes in market structure and market architecture. These changes have supported the rise of electronic, algorithmic, and high frequency trading. The specifics are unique to each and every market and regulatory environment. For example, in the South African context the JSE uses the MillenniumIT trading systems following the approach of the London Stock Exchange with the BDA broker-dealer clearing system [7] with rules and regulations set by the Financial Markets Act (2012), the JSE rules and directives, and the Financial Intelligence Centre (FIC) Act (2001). However, CoinTossX is fully configurable. Although implemented and tested here using the published and publicly available JSE market rules and test-cases, it can be configured for a rich variety of market structures.
Many researchers developing trading strategies do not have access to vendor provided testing environments, while those that do often do not want to expose their strategies to competitors in shared testing environments.
However, sufficiently realistic multi-agents simulation environments remain illusive, particularly for low liquidity, and collective behaviour based risk-event scenarios [8] because a key component remains the realism of the underlying trading agents and their interactions within test markets.
Most agent-based simulation environments and models are over simplified to the extent of not being relevant for realistic science and risk management. Some examples include using a global calendar time to sequentially order trading events, or providing intentional (and sometimes unintentional) market clearing events that synchronise price discovery and information flow with agent interactions in terms of a unique global time -the loss of highconcurrency and cohersion in favour of tight-coupling for computational convenience.
There are many examples, in the context of South African markets. Nair [9] prototyped a simple matching engine using a single stock and adopting the standard order types and mechanisms associated with a continuous double auction market as specified in the JSE. Although works like this seem to provide a simplified but realistic framework demonstrating the over-all principles of coupling a matching engine with agent-based modeling of a stock market; such frameworks are unlikely to recover realistic dynamics or lead to useful insights when interrogated at the level of asynchronous but high concurrency order-book events and dynamics. High-concurrency and a careful management of the concept of time [1] seem a prudent requirement related to both the stylised facts of the market, as argued for in many South African market examples [8], but more importantly, for realistic strategy testing and risk management.
Here like-for-like message delays, order-book rules, and both asynchronous order matching as well as reactive, asynchronous, and high-concurrency agent (for the rules and behaviours) and actor (for the computational representations and causation models) generation can affect model outcomes and estimation. It is the asynchrony and order-splitting at the agent level that can dominate the emergence of auto-correlations and cross-correlations in various traded assets because there is no single calendar time related mechanisms that generates equilibrium prices, can synchronise information and order-flow, or can co-ordinate machine time events, with sequential calendar time [1].

Realistically simulating high-concurrency
The LMAX Exchange (London Multi Asset Exchange) [10] is an FX exchange with an ultra low latency matching engine. Thompson et al. [10] developed the Disruptor ring buffer for inter-thread concurrency as an alternative to storing events in queues or linked lists. This was in response to the problem that linked-lists could grow and increase the garbage in the system -causing significant costs to latency and jitter 7 . At the heart of the disruptor mechanism sits a pre-allocated bounded data structure in the form of a ring-buffer. The Disruptor preallocates memory in a cache-friendly manner which is shared with the consumer. The system was also developed on the JVM and can process up to 25 million messages per second on a single thread with latencies lower than 50 nanoseconds [10].
Recently, Addison et al. [4] implement a simple foreign exchange (FX) trading system and deploy it to cloud environments from multiple cloud providers (Amazon Web Service, Microsoft Azure and Oracle Cloud Infrastructure), recording network latency and overall system latency in order to assess the capability of public cloud infrastructure in performing low-latency trade execution under various configurations and scenarios. They conclude that sufficiently low latency and controlled jitter can be achieved in a public cloud environment to support security trading in the public cloud [4]. More specifically, they demonstrate the ability to achieve sub-500 microsecond roundtrip latency -therefore concluding that it is currently feasible to build a production low-latency, highfrequency trading system in the cloud.
For research in finance, economics and computer science, the importance of having a realistic, flexible, high performance matching engine deployable to different environments can be found in a wide range of fields. In particular, given the whole new range of realism that such a software provides, agent-based computational finance and financial models in general are some of those that may show the greatest potential in terms of the insights to complex systems that can be gained from their application. This is not to mention the additional class/layer of causation that is introduced in the modelling framework by having the complex set rules of the environment/system/architecture be separate/independent from the modelling and decision making processes (at the agent level). In this way more emphasis can be placed on top-down actions and states -a potential step towards hierarchical causality [11].
On this note, CoinTossX is a simulation environment and so the task of producing realistic simulated market dynamics, comparable to those observed in empirical investigations, is left to the user(s). For this purpose the two popular methods usually adopted are mutually exciting Hawkes processes (see Section 4.4) and agent-based models (ABMs) [12].
ABMs provide a bottom-up approach to modelling the actions and interactions of autonomous agents with the aim of assessing their effect on a complex system. Proponents of agent-based models argue that financial markets 7 Jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. This variation in the time between data packets arriving is caused by network congestion. exhibit many emergent phenomena, and that such phenomena are usually attributed to the interactions and relationships between the agents that make up the system. Successfully calibrating ABMs to financial time series [13] can allow for inference about the factors determining the price behaviour observed in the real world, provided that parameter estimates are sufficiently robust.
In recent years, ABMs became popular as a tool to study macroeconomics -specifically, the impact of trading taxes, market regulatory policies, quantitative easing, and the general role of central banks. ABMs can also play an important role in analysis of the impact of the cross-market structure [12]. Of particular interest is the class of models described by Lussange et al. [12] who outline a computational research study of collective economic behavior via agent-based models, where each agent would be endowed with specific cognitive and behavioral biases known to the field of neuroeconomics, and at the same time autonomously implement rational quantitative financial strategies updated by machine learning (and reinforcement learning (RL)).
ABMs are not without their pitfalls [14,12,13]: computational cost, validation and calibration challenges, a bias towards inductive decision making, endemic parameter degeneracies, and their tendency to focus on mechanistic behaviours and rules, rather than interaction dynamics and learning . Despite these challenges, their potential to link the micro-level rules of investors behaviour with the macro-behaviour of asset prices in real market is compelling, in part due to the apparent expansive amount of data that is collected from financial markets. However, a key concern remains the scientific costs, or impact, of losing the realism of high-concurrency adaptive interactions between strategic agents in a sufficient reactive framework, for the computational convenience of timesynchronised bottom-up rule based approaches. Here we have decided to first focus on the matching engine framework, and separate it entirely from the agent generation framework; this may ensure that high-concurrency and low latency features of real markets are not lost.

CoinTossX
CoinTossX is a an open-source, high-frequency, low latency, high throughput matching engine for simulating the JSE [6,5,15] (or any market that has well established rules and test cases). The software was developed with Java and open-source libraries and is designed to maximize throughput, minimize latency, and accommodate rapid development of additional functionality. It can be configured for multiple clients, stocks and trading sessions (continuous trading, opening auction, closing auction, intraday auction and volatility auction). The software has been configured to replicate the rules and processes of the JSE. It may allow traders, organizations and academic institutions to test market structure, fragility and dynamics without the cost of live test trading. The software can provide a platform to study price formations in stock exchanges and the interplay between regulators, market structure and dynamics [6]. This work also addresses some aspects of the unavailability of data and direct data-feed access from industry, by providing a framework that can be compared to recorded transaction data arising from the actual market system interfaces.
The system requirements we implemented were obtained from JSE's publicly available technical documentation and test cases [16,17,18] The eight main components of the simulator are [6]: 1. Stocks are objects for which clients can send orders and limit order books can be constructed and kept track of.

2.
Clients are computer algorithms that send order events to the simulator. Clients will send order events to the trading gateway (refer to Section 4.2 for more detail).
3. The trading gateway receives the client request, validates the request and then sends it to the matching component to be processed. It sends updates to the client to indicate the status of the event.
4. The matching engine processes the events from the Trading Gateway. It manages one or more limit order books. If there is an update to the LOB, it sends updates to the market data gateway.
5. The market data gateway receives updates from the matching engine component and sends market data updates to all connected clients.
6. The website receives updates from the market data gateway. It displays the LOBs for each security and allows the user to configure the stocks and clients.  Figure 1: A High level diagram of the relationship between the components of CoinTossX in terms of end-user functionality [6].
A website is required to monitor and configure the stocks and clients. The relationships between these components are also summarised in Figure 1. The client sends a login request message to the trading gateway which then validates the client. If the client is already logged in or the username or password is invalid it responds with a reject code message. The client can also send a log out request to the trading gateway. In this case the trading gateway removes the client from the list of connected clients and sends a response to the client to indicate if the log out was successful. The trading gateway logs out clients when it shuts down. When there is a change in the trading session, the website sends a message to the trading gateway which then sends the message to the matching engine. Clients can send an order request or order message to the trading gateway which then validates the message. If the message is valid, it then sends the message to get matched. Valid order types are market, limit, hidden, stop and stop limit. Each order has a time in force (TIF) value, which cannot be amended, that determines how long an order is active until it is executed, deleted or expired (which ever comes first). The engine matches the orders using the matching algorithms based on the trading session. Market data updates are sent to the market data gateway to indicate the changes in the LOB of each stock. Additionally, the website and all clients receive market data updates and internal messages (to monitor the application) from the gateway. A user can create, update and delete clients and stocks. A user can view the limit order book for each stock as a bar chart of all active orders. Tables will show the details of these active orders. A user can start, stop and configure the parameters of the testing framework.

Architecture
This software is open source and was built to run on different operating systems as well as on a single server or on multiple servers. Implementation in Java means that CoinTossX can be deployed on different hardware configurations (due to the JVM). The goal was to build a matching engine that could achieve low-latencies using industry standard technology. This ensures that the software is portable and can be deployed confidently. The high level architecture of CoinTossX is summarised in Figure 2.
A single market event goes through the following process. A simple binary encoding (SBE) message 9 will be sent to the trading gateway. The trading gateway will forward the message to the matching engine which will then process it. It will send a message back to the trading gateway to indicate the status of the message. The trading gateway will forward the message back to the client. The matching engine will send an update to the market data gateway if there is a change in the limit order book. The market data gateway will forward the updates to all connected clients and will forward some of the updates to the web event listener. The web event listener saves each event to the file system. The website then reads and displays the data from the file system.
The communication between the website and the file system is done by transmitting data over a network using only User Datagram Protocol (UDP) 10 as opposed to Transmission Control Protocol (TCP). TCP is a heavyweight protocol because it requires three packets to set up a connection and requires a connection to be set up between two applications before data can be transmitted 11 . UDP, on the other hand, was chosen for being lightweight protocol as it does not check the connection or the order of the messages and does not require a connection to be setup before data is transmitted. Data is transmitted irrespective of whether the receiver is ready to receive the message or not. UDP may, however, be unreliable because the sender does not know if the message was delivered. The messages that are received (read as a byte stream) will always be in the order that it is sent. Modularisation is achieved by designing each of the above components independently of one other while allowing communication between them to occur only via exposed ports using these high speed SBE message protocols. This functionality was introduced by assigning each component an IP address and port on which to listen. These ports are specified in the "properties" files in the root project directory. Furthermore, each client/user may submit orders from a remote server. So if Coin-TossX is deployed to the cloud, one should ensure that the virtual machine allows for these types of inbound port communications.
After an extensive comparison of message transport software, the Aeron media driver library was chosen for supporting the above protocols being an efficient and reliable 9 JSE uses text messages between the clients and their matching engine which is inefficient since characters take up more memory and are slower to transmit across the network. From the comparisons made in [6], the SBE message protocol was found to be the fastest way to encode and decode messages; and had the greatest throughput.
10 JSE uses TCP for their trading gateway and UDP for their market data gateway. 11 Nonetheless, TCP can be reliable because if a message is not received, it will try multiple times to deliver the message. TCP will drop the connection if there are multiple timeouts.
UDP unicast, UDP multicast, and IPC message transport for communicating between all the components. Each component has its own media driver to shovel events to and from the component. Aeron is designed to be the highest throughput with the lowest latency possible of any messaging system [19] (see Figure 3).
Given this modular design, each component and client may be started and run on seperate, independent servers. As mentioned, components communicate with each other through the media driver via UDP SBE message packets. Therefore. it is essentially possible to have an industrial matching engine by providing each component with its own high performance server.   The matching engine is the component which relies heaviest on the latency and throughput of the system and is therefore designed with these features in mind whilst sticking to the JSE matching engine's architecture. The software was designed such that different matching logic algorithms are in separate Java classes. This allows the logic to be changed to test any variations of the matching logic. The matching engine adopts the price-time priority algorithm during the continuous trading session. That is, for multiple orders occurring at the same price, a market order will be matched with the order having the earliest submission time. The trading session execution times and monitoring was not implemented in the components as this would reduce the throughput and increase the latency.
Limit and market orders submitted during a call auction are not matched immediately. These orders are matched at the end of the call auction at a single price using a price discovery process. The Volume Maximizing Auction Algorithm finds the price that will match the most number of buy and sell orders. More specifically, during both the continuous trading and auction call sessions, the matching engine processes orders as follows. The filter and uncross algorithms run each time the Best Bid or Offer (BBO) changes or every 30 seconds. The filter algorithm uses the heuristic Hill Climber search/optimization algorithm 12 to find the optimal volume of hidden limit orders that can be executed. The search will filter out hidden limit orders with MES constraints that are not eligible. After the filtering, specific rules are used to select the orders and price to executed in the crossed region.
As in the JSE, the matching engine uses native message protocols 13 to communicate with clients as opposed to FIX 14 (Financial Information eXchange), FAST 15 (FIX Adapted for STreaming), ITCH 16 or OUTCH 17 . The matching engine component stores the active orders in memory since storing the active orders on disk would increase the I/O and reduce performance. The data structure for storing the LOB is designed to have a low memory overhead (lists, hash tables, trees, B-tree, B+tree) and be efficient in searching, updating and deleting orders [6]. More importantly, the low latency is achieved through the efficient use of CPU cache. The simulator only uses main memory without the use of virtual memory 18 . The time taken for data to move from the CPU to main memory is reduced by using caches that contain copies of frequently used data from main memory.
The website was developed using Spring Boot and Apache Wicket and only has the permission to read from the file system. The original design had the event listener and website in the same Java process. When the website was used or paused because of garbage collection, it affected 12 The Hill Climber algorithm is an optimization technique that iteratively searches for the solution to a problem by changing one element in each iteration 13 Message protocols are the methods by which the exchange communicates with market participants. Native protocols are custom built protocols. 14 FIX is an open standard non-propriety protocol that is used by buy and sell firms, trading platforms and regulators to transmit trade data. The protocol allows organizations to easily communicate domestically and internationally with each other. 15 Greater numbers of market participants and therefore volume increased the network latency and lead to market participants not receiving updates in an acceptable time. The FAST protocol reduces latency by encoding and compressing the data before transmission. 16 ITCH is used to publish market data only. 17 OUCH is used to place, amend and cancel orders. 18 Virtual memory is space on the hard disk which is used by an operating system when it requires more memory. The time to fetch data from virtual memory is very slow and is therefore not used.

Clients
the receiving and saving of events. Therefore this logic was split into a separate component -the web event listener. The listener could keep the received events in memory or save it to the file system. Saving the data in memory would be fast, but would require an unknown maximum memory setting. Therefore the data needs to be saved to the file system. Using the MapDB library 19 , an off heap hashmap is used to save the events to memory mapped files. An off heap hashmap stores data outside the Java heap space and is not affected by garbage collection. Off heap data is suited for storing data larger than the current memory and allows sharing of data between JVMs. Memory mapped files allow Java programs to read and write files using only memory while the operating system reads and writes to the file system. This significantly improves performance. The entire file or a part of the file can be loaded into memory. The values in memory will still be written to the file system even if the JVM crashes. The web event listener has read and write permissions. It saves events to the file system but receives events faster than it can save it to file. This issue is solved with the Disruptor by having two threads: one to receive and store events, and the other to save events to the file system [10].

Clients
Important to the design of this trading system are the clients who send orders to the trading gateway and receive updates from the website and market data gateway. Each client is assigned input and output URLs for both the Native Gateway and Market Data Gateway. These URLs specify the IP addresses (in this case localhostthe user's local machine) to/from which messages will be sent/received as well as the ports on which these components are listening. Therefore each client takes up a number of threads on the machine it runs on -which can be the same or different from the machine running the other components of the matching engine. So, having clients send orders to the server remotely would free up hardware and improve performance on the matching engine server.
It is also important to note that each client here is a unique trader having its own unique ID, password, ports and login credentials. However, as was done in the testing framework, a single client may still be used to simulate multiple traders. This would be preferable in the case where the client(s) and other components run on the same machine, thereby limiting the computational resources required for simulations. In that case, the client is simply a tool/object for submitting orders to the gateway. In this way one can have a self-contained simulation framework be independent from the actual implementation or sending of orders by, for example, defining an ABM in julia 19 MapDB is an open-source, embedded Java database engine and collection framework. It provides Maps, Sets, Lists, Queues, Bitmaps with range queries, expiration, compression, off-heap storage and streaming.
with each agent holding a reference to the client object that sends orders.
The number of client-stock pairs that can be supported in total is only limited to the performance capabilities of the server on which the matching engine is running (see section 4.4 for hardware recommendations and performance results with differing number of client-stock pairs). The limits of the number of clients that can be supported by each stock, however, still needs to be explored and measured with respect to different hardware configurations.

Functionality
When the user starts CoinTossX, through the website, three main screens can be switched between and displayed. The stock screen shows the stocks configured, the trading session that is active for each stock and a button to view the limit order book of the stock. In the limit order books for each stock a bar chart will be generated to display the active orders. The market data shown on the LOB pages for each stock are snapshots representing the current state of the limit order book. Tables will show the details of the bids, offers, trades and all submitted orders. All data on these pages can be exported. The Hawkes configuration page (see Section 4.4) allows the user to change the values of the Hawkes input data before running the simulation. The simulation page allows the user to stop and start the warmup process and the Hawkes simulation. It shows the status of each client and the active trading session. The clients screen shows the clients that are configured and allows the user to create, update and delete clients. Clients not configured will not be able to log in to the trading gateway. In order for a client to submit orders, a login request, with the relevant username and password, must first be sent to the trading gateway. Similarly, log-out request must be sent when the client logs out. Thereafter the client can submit orders by publishing order messages to the trading gateway via UDP ports. Traders cannot submit orders that are smaller than the LOB's tick size (controls the smallest order size). The allowable order types that are not conditioned on time are [16]: 1. Market orders (MO) -contains the quantity of shares to trade, but not the price, and executes against each orders in the LOB on the contra side until it is fully filled. Orders submitted during the auction call will remain in the LOB until uncrossing is done. All market orders that are not filled will expire.
2. Limit order (LO) -displays the quantity and price which may execute against a trade or expire based on it's time-in-force (TIF).
3. Hidden orders (HO) -allows traders to submit orders which are completely hidden from the market (price and volume). These have a lower priority than all visible active orders. These orders can execute against other visible and hidden orders. Hidden limit

Functionality
orders that are submitted during an auction call are rejected. The quantity of a hidden order must meet the minimum reserve size (MRS) 20 . Each hidden order also has a minimum execution size (MES) 21 . After a trade executes against a hidden order, if the remaining quantity is less than the MES (¿ MRS) then the order will expire. If the remaining quantity is greater than the MRS and MES then the order will only expire when it executes or based on it's TIF (whichever comes first). Hidden orders may only be submitted during the continuous trading session.

Stop order and stop limit order (SO & SL)
-a stop order is a market order with a stop price. These orders do not enter the order book (remain unelected) until their stop price is reached. When the stop price is reached, the stop order becomes a market order. Similarly, a stop limit order is a limit order with a stop price. These orders do not enter the order book until their stop price is reached. When the stop price is reached, the stop order becomes a limit order. Stop and stop limit buy orders will be elected if the last traded price is equal to or greater than the stop price. Similarly, stop and stop limit sell orders will be elected if the last traded price is equal to or less than the stop price. An incoming stop or stop limit order may be immediately elected on receipt if the stop price has already been reached. These orders will also only be elected at the end of the execution of an order.
Some of the above orders can also have a time-in-force (TIF) which cannot be amended after the order is placed.
The following time in force options are available [16]: 1. At the opening (OPG) -only accepted during the opening auction and are used to direct orders only towards this session. OPG orders that are not filled during the uncrossing will expire at the end of the opening auction. If there is no opening auction scheduled, OPG orders will be rejected.

2.
Good for auction (GFA) -orders of this type that are submitted will be parked 22 until the next auction and are injected at the start of the auction. GFA orders that are not filled during the uncrossing will be parked for the next auction and are only removed when they are filled or cancelled. If there is no auction session scheduled, GFA orders will be rejected.
3. Good for intraday auction (GFX) -These orders are parked until the next intraday auction and are injected at the start of the auction. As opposed to GFA orders, GFX orders that are not filled during uncrossing will expire at the end of the intraday 20 The minimum order quantity for orders to qualify as hidden limit orders. 21 The minimum quantity of the hidden limit order which is permitted to execute 22 While orders are parked they cannot be executed.
auction. If there is no intraday auction scheduled, GFX orders will be rejected.

4.
At the close (ATC) -submitted orders of this type are parked until the next closing auction. They are injected at the start of the auction and if they are not filled, will expire at the end of the closing auction. If there is no closing auction scheduled, ATC orders will be rejected. 5. Day (DAY) -orders that expire at the end of the trading day. If an order does not specify a TIF, it will default to DAY.
6. Immediate or cancel (IOC) -can be partially or fully filled. An IOC that is only partially filled will cancel immediately after execution. IOC orders are rejected during auction calls.
7. Fill or kill (FOK) -this order type differs from IOC orders in that they are either fully filled or expired. Partially filled orders are not allowed.
8. Good till cancel (GTC) -can remain in the order book for a maximum of 90 calendar days or until the order is filled or canceled.
9. Good till date (GTD) -as opposed to GTC orders, these orders remain in the order book until the order is filled, cancelled or a specified expiration date is reached (maximum active time is 90 days). GTD orders do not allow for the specification of an expiration time (only date).
10. Good till time (GTT) -remain in the order book until the specified expiry time is reached in the trading day. GTT orders that have an expiry time during an auction will not expire until uncrossing is finished.
11. Closing price cross (CPX) -these orders are parked until the start of the closing price cross session. Unexecuted CPX orders are expired at the end of the closing price cross session. Stop and stop limit orders are not allowed to have a TIF of type CPX.
Valid combinations of order types and TIF are shown in Table 1 below.  Lastly, CoinTossX also allows for multiple trading session types which correspond to that of the JSE. The trading sessions below and the rules adopted by each are configurable in the data/tradingSessionsCron.properties file. The times shown are those of the JSE, provided for context [16]. All trading session start and end times can be specified by the user through the cron expressions in the data directory. During the auction call sessions orders are not matched immediately, rather, they are matched at the end of the call auction at a single price using a volume maximizing price-discovery process. This process finds the price that will match the most number or buy and sell orders.
1. Start of trading (07:00 -08:30 ) -no orders can be submitted or executed executed during this session. Traders will be able to cancel, but not submit, orders during this session.

2.
Opening auction call sessions (08:30 -09:00 ) 3. Continuous trading session (09:00 -16:50 )the system will continuously match incoming orders against those in the order book according to the price-time priority execution rule.
4. Volatility auction call session (triggered ) -this session will only trigger when a stock's circuit breaker tolerance level has been breached. volatility auction call sessions last for a scheduled period of 5 minutes. The orders accumulated during this session will be executed at the uncrossing based on the volume maximizing algorithm. -no orders can be submitted or executed executed during this session. Traders will be able to cancel, but not submit, orders during this session.
8. Closing price cross session (17:05 -17:10 ) -trading will only take place at the closing price that was published during the closing price publication session.
9. Post close session (17:05 -18:15 ) -traders will be able to cancel, but not submit, orders during this session. 10. Halt (manually envoked ) -A halt session may be activated by the user during which time no order equests may be executed. Traders will however be able to cancel orders.
11. Halt and close (manually envoked ) -The behaviour is the same as the halt session except closing price calculations will be performed.
12. Pause (manually envoked ) -No executions will take place during the pause session. Traders will be able to submit, amend or cancel orders during this session, however, market orders will expire at the end of the session.
13. Re-opening auction call (manually envoked ) -The user may manually invoke the re-opening auction call session when resuming from a manual trading halt or a trading pause.

Testing framework
CoinTossX has been successfully deployed locally as well as to remote servers such as Microsoft Azure, CHPC and TW Kambule Mathematical Sciences Laboratories provided servers (using 4 re-purposed legacy TACC Ranger blades, see Table 4). Deployment to high-performance compute solutions such as UCT HPC was found to be infeasible for a number of reasons: First, facilities such as UCT HPC often perform computations by relying on an MPI approach using one of a variety of different job management systems where the submission of "jobs" to "worker nodes" is highly constrained by the preferences of systems administrators managing many different use cases; the worker nodes are not equivalent to virtual machines, rather they receive tasks from the "head node" to be executed in parallel. Simplistically this means that, for example, web interfaces will not be easily accessible from worker nodes due to the system and network architecture -but this more generally impacts any client worker interactions when agents interact via clients through some centralised architecture -here the matching engine; but this can be any interaction landscape. In general, such systems are poorly suited for real-time and reactive use cases. This is because, by design, many high-performance computing facilities, such as UCT HPC or CHPC, are not high-throughput facilities. They are typically not well suited for large scale high-concurrency, low latency market and agent-based simulation problems that necessarily  need to be highly reactive 23 in nature. We believe that this is an important design perspective that is often not well considered when designing advanced market (or social science) simulators.

Unit testing
The functionality of the software was tested using the test cases made available online from the JSE. The requirements were taken from [16]. Testing of the trading sessions were restricted to the continuous and intraday auction trading sessions using only the DAY TIF. Unit tests were implemented to cover the testing of the functional requirements of the software while individual throughput and latency performance tests were implemented in conjunction with the Java Microbenchmark Harness 24 (JMH) to test methods whose performance were critical. These tests, covering the majority of the application, provide a safety net to allow changes to be made to the code without breaking existing functionality.
The outputs of these tests were not compared to the JSE or another exchange as the data is not available by the 23 Reactive is used here in the broad sense of system architectures that require the fast propagation of data changes and relationships within a system with many clients, but require high coherence with low coupling. 24 JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM industry. The JSE's test environment is also closed and does not provide realistic order-book dynamics. This paper focuses mainly on the results of latency and throughput tests, however, the types and results of the extensive list of tests performed can be found by referring to Sing and Gebbie [6].

Order-flow testing
Matching engine integrity was evaluated using unit tests and, instead of an agent-based approach, simulations and tests aim to understand throughput and latency were carried out with a 8-variate marked Hawkes process [20,21]. This provides a flexible framework to simulate a market data feed with varying throughput, with full control over the trade and quote conditional intensities. The software and Hawkes client processes were deployed on one server which affected the performance of all components. The mutually exciting processes correspond to 8 different order types that are considered in the testing framework. The testing framework only considers basic aggressive and passive market and limit orders as in Large [22]: Ask at or above best ask Table 3: Order event types used for the 8-variate Hawkes process in the testing framework simulation following the approach of Large [22].

Testing framework
The simulation is conducted using the intensity-based thinning algorithm as introduced by Lewis and Shedler [23] and modified by Ogata [24] and is used to define the time at which orders arrive. Each thinning algorithm submitting large numbers of orders may be thought of as clients. So, in the case of the testing framework, a single client is associated with each stock and is meant to represent a large group of traders investing in that stock. The prices and volumes are generated based on the order type in a fairly random manner.
First, a maximum LOB depth M = 10 is specified. A lower limit for the price at which buy limit orders are generated is set to L b = 25000. Similarly, an upper limit for the price at which sell limit orders are generated is set to H s = 25057. The LOB is initialised with an initial limit and buy I b = 25034 and sell I s = 25057 order. Thereafter prices and volumes are generated according to a random normal distribution in these bounds. According to this bids and asks can cross each other -which is not realistic. Nonetheless it's purpose is merely to demonstrate the applications ability. The VWAP is used to calculate the price for an aggressive buy/sell trade (not the same as the execution price) that executes against the LOB. That is, if an aggressive trade affects the first k highest/lowest levels of the order book then the price is calculated as: For the next section, test scenarios were created to test the performance of the software by evaluating the impact of multiple clients-stock pairs. That is, each stock is assigned a unique client who sends high volumes of orders. By increasing the number of stocks and clients after each subsequent run, the volume of messages being processed increases as well -during which time the throughput per second and latency are automatically recorded. These performance tests have been conducted on multiple computers, all with different operating systems and hardware specifications. Here the performance results are documented by running the application on the two machines listed in Table 4. The average start-up time of the web application is approximately 50 seconds.  Although not presented here, limit order book storage testing was also conducted on the WITS Mathematical Science Support provided server hardware and demonstrated the ability to store thousands of orders at each price point. The design of the LOB also allows orders to be added and removed easily

Latency tests
The latency was tested and visualized using the HdrHistogram library 25 . With every run, each client submits approximately 110 000 orders. The time it takes to process all these orders is then measured and compared between machines. To reproduce Figure 4 simply re-run the Hawkes simulations for the system to write latency and throughput results to file. Figure 4 shows the latency results, in nanoseconds, for runs with differing numbers of active client-stock pairs on each machine. The latency increases as the number of clients-stock pairs increase -since each client is associated with multiple threads running processes in parallel. Due to there being only 4 CPUs on the Azure VM, running more than 6 clients simultaneously was found to be infeasible. For this reason the user should ensure that the hardware on the server is capable of supporting the desired number of clients. For the high spec Wits Server machine the minimum and maximum latency (measured up to 10 clients) at the 90th percentile is 106ns and 248ns, respectively. Similarly, for the medium spec machine (with a maximum of 6 clients) the minimum and maximum latency at the 90th percentile is 123ns and 393ns, respectively. Therefore, on a high end machine one can expect sub 250ns latency (with 10 clients), while with a medium spec machine one can expect sub 400ns latency (with 6 clinets) 26 . Figure 5 considers the scenario where high volumes of orders are submitted to the trading gateway. One million orders are submitted from a single client and compared across the two machines. For the Wits server, the latency is 735ns at the 90th percentile but maintains a significantly lower latency on average. On the other hand, the Azure server has higher latency's on average with a latency of 964ns at the 90th percentile.  Table 5 shows the throughput per second as the number of clients-stock pairs increase on each machine. To reproduce the data in Table 5 simply re-run the Hawkes simulations for the system to write latency and throughput results to file. It should be noted that each client waits for market data updates before calculating the next order to send. The client also waits a few nanoseconds as part of the Hawkes simulation. These delays reduced the throughput of orders sent. The results show that as the number of clients-stock pairs increase, the throughput decreases. The most significant decrease in throughput is when more than 4 clients and stocks are used. Beyond 6 clients the hardware configuration of the Azure VM was found to be insufficient. In total, the high spec machine was shown to be capable of processing more than one million orders in less than a 19 minute simulation period with significantly low latencies. On the other hand, the medium spec machine was capable of processing 450000 and 560000 orders in approximately 44 minute and 2 hour simulation periods for 4 and 6 client-stock pairs, respectively.

Simulation results
The results in this section relate to the simulation for a single stock using a simple Hawkes process as a proof of microseconds and it's matching engine has a latency of 50 microseconds [25].
concept. Simulation results are contained in three seperate .csv files for each security: one for market orders only, another for limit orders and the last for a snapshot of the limit order book at then end of the simulation. This data is only printed to file at the end of the simulation once all logged-in clients have logged out. The format of these files are shown in snippets 6 and 7 below. The submission times are given by the UTC standard. TradId's correspond to the OrderId's of the limit orders order against which they were executed.

Conclusion
CoinTossX is a low latency high throughput stock exchange. It is configurable and allows users to view the limit order book in real time as well as for multiple clients to connect and send orders to the exchange. The exchange supports multiple stocks and a variety of different trading session. Here Hawkes processes are used to provide a simple but robust simulation of order arrivals for infrastructure testing. The CoinTossX website can be enhanced to analyze the data that is processed. The exchange provides a realistic platform for agent based models exploration. The software was designed so that different matching logic algorithms are in separate Java classes. This allows the logic to be changed to test any variations of the matching logic or market rules. Hence, additional work can be done to change the matching rules on the engine to test the impact of rules and regulation changes on the limit order book and its dynamics. Since the components are de-coupled, their implementation language can be continually and easily changed to support the latest frameworks [5]. Future work will be aimed at better understanding the interaction dynamics from increasingly realistic approaches to point-process based simulations e.g. with trading clients based on Hawkes processes using models such as those of Bacry and Muzy [26] and Zheng et al. [27] to investigate the interactions of limitorder and market-order trading agents through a realistic

Acknowledgement
We Below are the instructions for deploying CoinTossX to a user's local machine or virtual machine in a cloud environment. These instructions apply to Windows, Linux and OSX operating systems. CoinTossX is a Java web application and is built using the Gradle build tool. The user need not install Gradle or change the version of Gradle installed on their system as the project uses the Gradle wrapper to download and run the required Gradle version a . Currently CoinTossX is compatible with Java version 8 and Gradle version 6.7.1. The application can be started from the command line, however, it is recommended that the user make use of a Java IDE such as Eclipse or IntelliJ IDEA to automate the start-up process and simplify deployment. Prerequisites 1. JDK version 8 (see Installation section for more detail) 2. A computer with 4 or more CPU cores and a sufficient amount of RAM (ideally 4 cores 32 GB but can be less depending on the number of clients and stocks used). 3. A Java IDE such as Eclipse or IntelliJ IDEA. 4. The user may be required to run commands using Command Prompt (Windows) or Bash (Linux and OSX).

Additional resources
• Tutorial for building a java project with gradle: .
• Tutorial for building a simple Java web application: • Introduction to Spring Boot • Introduction to Apache Wicket • Tutorial for setting up a student Azure account: • Instructions for the installation of Oracle JDK 8 and configuration of the system environment variables on Linux: • Calling Java in Julia , Python and R Installation If the correct version of Java is already installed and configured correctly, the user can skip step 1 below.
1. CoinTossX is currently only compatible with version 8 of Java. Therefore, the user should install version 8 of either the Oracle ( ) or Open JDK (Java Development Kit) b . For simplicity, it is assumed that Windows and OSX users will install the Oracle JDK while Linux users will install the Open JDK.
• Windows -After installing Oracle JDK 8 using the link above, if not done so automatically by the java install wizard, ensure that JAVA_HOME is set and that the java executable is set in the path environment variable. To set/add java to the system's JAVA_HOME and path environment variables, go to Settings > Advanced System Settings →> Environment Variables > System Variables. Then add the location of the java installation to a new variable called JAVA_HOME which points to the relevant java distribution (e.g. C :\ Program Files \ Java \ jdk1 .8.0 _271). Thereafter append a new pointer in the path environment variable by adding \% JAVA_HOME \%\ bin. • Linux -Installation of Open JDK 8 is done by executing the commands below: sudo apt -get update sudo apt -get install openjdk - 8 -jdk After this, if the correct version of Java is still not being used, the user can switch to the correct version using The next set of instructions provides guidelines for the deployment of CoinTossX. By this point the user will have cloned the repository to a location of their choosing for either local or remote deployment. Depending on the operating system, different deployment configurations would need to be employed -hence the multiple configuration and deployment files. The files with the ".properties" extensions define the required ports for each component as well as user specific path definitions. To match the location to which the repository was cloned, the user would have to configure the local . properties file (for Linux) or the windows . properties file (for Windows) to correspond with the user's directories. For remote deployment the user would have to configure the remote . properties with the corresponding paths on the remote server. The path variables which need to be configured are: • MEDIA_DRIVER_DI R pointing to the Aeron Media Driver. This folder will only be created after the application is started. For Linux users it is recommended that the default path (/dev/shm) is not deviated from in order to achieve optimal performance. Otherwise the path can be amended as follows: Note that if the user deviates from the default media driver path, they would have to make the same change in the build . gradle file.
• SOFTWARE_PATH pointing to the start-up folder that will be created upon deployment. This path can be set to any valid path on the user's machine and may be given any name. • DATA_PATH points to the data folder within the software path. For example: Before the application can be started, we are required to change a few system settings to ensure that network performance and system memory are utilised correctly. Firstly, the receive and send UDP buffer sizes/limits need to be configured as follows. Last is the running of the application. Users deploying the application to Microsoft Azure, CHPC, Wits Server or any other server may choose to do so locally or remotely. Remote deployment will require that the user specify the above paths to correspond with that of the remote server. The instructions below demonstrate both local and remote deployment. For users deploying remotely, one must first ensure that SSH is enabled on the server and that port 22 is open for the transfer of files. Additionally, the username, IP address and password fields in the deploy_remote . gradle file should match that of the server.

Usage
To configure the trading sessions to be fired during the simulation refer to t r a d i ng S e s s i o n s C r o n . properties file found in the data directory. The usage/syntax of the cron expressions within that file are as follows. A cron expression is a string consisting of six or seven fields, separated by white space, that describe individual details of the schedule: A few examples, provided in t r a d i n g S e s s i o n s C r o n . properties, are shown on the right. Each component of the system as well as other actions can be started independently via the shell scripts. The list of all the runnable shell scripts can be found in the deploy / scripts directory. Each shell script simply executes the Java byte code of a "main" method whose class is specified in the build . gradle file of the corresponding module. For example, the startAll . sh script starts each component consecutively (equivalent to clicking the "Start" button on the Hawkes simulation web page). Similarly, stopAll . sh stops all the components (equivalent to clicking the "Shut Down" button on the Hawkes simulation webpage). Furthermore, any other actions on the website can also be done from the command line. To start the Hawkes simulation for a single client and stock simply run ./ s t a r t H a w k e s S i m u l a t i o n . sh 1 1. This starts the simulation for client 1 and stock 1 by submitting the client ID and stock ID as the first and second argument, respectively.
The list of clients that can be activated can be found in the data / ClientData . csv file. Consider, for example, the first client shown below.
From the website the user can: view/add/edit/delete clients, view the simulation status of clients and stocks, start the Hawkes simulation, edit the Hawkes simulation parameters, view/extract Hawkes simulation data as well as view a snapshot of the limit order books of each stock. With regards to the output of results, after all clients have submitted an end of trading session message, the orders along with their submission times are written to file and stored in the deploy / data directory. At the end of each Hawkes simulation the HdrHistogram latency results are written to a text file in the same directory.