sDNA: 3-d spatial network analysis for GIS, CAD, Command Line & Python

Spatial Design Network Analysis (sDNA) is a toolbox for 3-d spatial network analysis, especially street/path/urban network analysis, motivated by a need to use network links as the principal unit of analysis in order to analyse existing network data. sDNA is usable from QGIS & ArcGIS geographic information systems, AutoCAD, the command line, and via its own Python API. It computes measures of accessibility (reach, mean distance/closeness centrality, gravity), flows (bidirectional betweenness centrality) and efficiency (circuity) as well as convex hull properties, localised within lower-and upper-bounded radial bands. Weighting is flexible and can make use of geometric properties, data attached to links, zones, matrices or combinations of the above. Motivated by a desire to base network analysis on route choice and spatial cognition, the definition of distance can be network-Euclidean, angular, a mixture of both, custom, or specific to cyclists (avoiding slope and motorised traffic). In addition to statistics on network links, the following outputs can be computed: geodesics, network buffers, accessibility maps, convex hulls, flow bundles and skim matrices. Further tools assist with network preparation and calibration of network models to observed data. To date, sDNA has been used mainly for urban network analysis both by academics and city planners/engineers, for tasks including prediction of pedestrian, cyclist, vehicle and metro flows and mode choice; also quantification of the built environment for epidemiology and urban planning &

a b s t r a c t Spatial Design Network Analysis (sDNA) is a toolbox for 3-d spatial network analysis, especially street/path/urban network analysis, motivated by a need to use network links as the principal unit of analysis in order to analyse existing network data.sDNA is usable from QGIS & ArcGIS geographic information systems, AutoCAD, the command line, and via its own Python API.It computes measures of accessibility (reach, mean distance/closeness centrality, gravity), flows (bidirectional betweenness centrality) and efficiency (circuity) as well as convex hull properties, localised within lower-and upperbounded radial bands.Weighting is flexible and can make use of geometric properties, data attached to links, zones, matrices or combinations of the above.Motivated by a desire to base network analysis on route choice and spatial cognition, the definition of distance can be network-Euclidean, angular, a mixture of both, custom, or specific to cyclists (avoiding slope and motorised traffic).In addition to statistics on network links, the following outputs can be computed: geodesics, network buffers, accessibility maps, convex hulls, flow bundles and skim matrices.Further tools assist with network preparation and calibration of network models to observed data.
To date, sDNA has been used mainly for urban network analysis both by academics and city planners/engineers, for tasks including prediction of pedestrian, cyclist, vehicle and metro flows and mode choice; also quantification of the built environment for epidemiology and urban planning & design.

Motivation and significance
Spatial Network Analysis is the special case of network analysis in which nodes have positions in physical space and optionally, links between them have geometry.Within the literature on transportation and urban design, the term is used interchangeably with street/path/urban network analysis; in this paper we https://doi.org/10.1016/j.softx.2020.1005252352-7110/© 2020 The Authors.Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).follow that tradition but do not exclude other applications.Current spatial network analysis makes extensive use of closeness, betweenness and reach measures taken from network analysis [6][7][8][9][10][11][12].Despite widespread use, however, limited consideration has been given to how spatial network analysis varies from the parent discipline of network analysis.Aspatial measures are 'spatialised' in four ways: firstly, by using geometric properties of links to compute the network distances, howsoever defined, upon which these measures depend.Secondly, by restricting computation of measures to a local spatial neighbourhood surrounding each point (see [13] for further discussion of this technique).Thirdly, additional network measures can explicitly incorporate space, for example, straightness/circuity [14,15].Finally, it is common to focus on the link rather than node as primary unit of analysis, as links (being 1-rather than 0-dimensional) in some sense occupy more space: this is physically true in the case of many real world networks e.g.road systems.This gives rise to a dual representation [16] in which links (or parts thereof) are encoded as nodes, and nodes as links between them.
Broadly speaking, the measures produced in spatial network analysis quantify either accessibility, flow or efficiency; many of these are also considered to be measures of centrality.Such measures can be used to predict multiple phenomena including but not limited to transport behaviours, land use change and health (see Section 4).The discipline suffers from a lack of common terminology to describe the same underlying concepts, so we define terms here: • In the urban planning domain, a large volume of spatial network analysis is labelled as space syntax following [17].The two terms are not quite congruent as the latter tradition focuses on analysis of isovists, axial and convex space; however these objects are often processed as spatial networks, hence the overlap of interests.
• In modern spatial network analysis, local spatial neighbourhoods are typically used to restrict the scope of analysis around each link.These can variously be interpreted as catchment areas, maximum trip lengths or network buffers; in sDNA we adopt the term radius to refer both to the distance defining the size of the neighbourhood, and the neighbourhood itself.
• The quantity of network within a given radius is an important measure of accessibility referred to as reach, or sometimes density (whether or not divided by radius length or area).
• Network distance is often referred to as weight, impedance, cost, or in the space syntax literature, depth.We prefer to reserve 'weight' for importance of origins and destinations, avoid 'impedance' and 'depth' as the physical analogies are not accurate, and avoid 'cost' except for financial costs e.g. of fuel or public transport.Instead, we describe distance between two points, noting that distance can be defined in many ways depending on the choice of metric.We distinguish between an analytical metric, defining the measures computed in analysis, with a radial metric which defines the locality of analysis.
• Common metrics include crow-flight-Euclidean (straight line distance between two points), network-Euclidean (Euclidean distance measured along the network), and angular (cumulative absolute directional change along the route).The literature is sometimes ambiguous on whether Euclidean refers to network-or crow-flight-Euclidean; unfortunately we have been guilty of this ourselves so to set the record straight, where sDNA literature refers to Euclidean metrics this means network-Euclidean unless otherwise stated.Space syntax literature also uses the term 'metric' to refer exclusively to the network-Euclidean metric, so it bears emphasizing that sDNA differs in this respect, using 'metric' in the mathematical sense of any definition of distance.
• Closeness measures accessibility, usually the inverse mean distance of a given link or node on the network to all other links/nodes within a localised radius.Space syntax literature also uses the term integration, although this is not unambiguous, as integration and closeness have each been defined both as inverse sum of distances and as inverse mean.Although the original formulation of closeness [18] used the sum formulation, sum of distance does not allow valid accessibility comparisons between networks of varying size, or localised analysis on a single network with varying density, hence the use of inverse mean in more recent work.Inverse sums are still used in some definitions of integration but it is likely an error to interpret them as a measure of accessibility e.g.[19].
• Gravity models combine reach and mean distance into a single measure of accessibility.If reach measures quantity of access, and mean distance measures quality of access, gravity models measure both (with at least one parameter required to specify the relative importance of each).
• Betweenness is the sum total of shortest paths from everywhere to everywhere (possibly subject to a weighting and/or maximum trip length) which traverse a given link or node.In space syntax literature this is known as choice.
• Most analysis relies on shortest paths through the network, also known as geodesics which vary with the choice of metric.
Comparable open source tools for spatial network analysis include: • MIT Urban Network Analysis (UNA) [20] plugin for ArcGIS and RhinoCAD.The ArcGIS version of UNA requires the proprietary ESRI Network Analyst, in addition to the base ESRI ArcGIS software [21], in order to run.UNA does not actually compute statistics for network links, but rather for a point or polygon buildings layer with a 2-d network used to deduce connections between these.It computes reach, gravity, betweenness, closeness, straightness, redundancy index and paths, and the wayfinding index; all of which can be based on a crow-flight-Euclidean as well as network-Euclidean radius.
• DepthmapX [22] a stand-alone space syntax tool, originally intended for analysis of axial lines in urban spaces and since updated to road centre lines by dividing each link into one or more straight line segments.It computes 2-d depth, integration and choice within a network-Euclidean or topological radius.To overcome the problem that choice of segments to use to represent a link is arbitrary, road network analysis including DepthmapX has often used link segments weighted by length [23], however given evidence that link density is also important in urban networks [24], there is a case to be made for software which treats links as primary both in weighting and analysis [25].
• the Place Syntax Tool (PST) [26] computes reach, close- ness and betweenness using network-Euclidean, angular or other metrics, within a radius defined in the same way or alternatively using a crow-flight-Euclidean radius, for either straight link segments or optionally a point layer representing origins/destinations.
Both the network theory and software of sDNA differ from competing approaches by treating network links (or user chosen subdivisions thereof) -rather than straight link segments, or buildings -as the primary spatial unit of analysis.We define a network node as any point where it is possible to travel in 3 or more directions, and a link as the connection between two adjacent nodes, or between a single node and a dead end (Fig. 1).Contrasting other approaches, a curved link which would be treated as a large number of individual straight line segments in DepthmapX or PST, is handled as a single unit in sDNA.Reducing the number of units in this manner can lead to large increases in speed on road networks, as the time complexity of betweenness computation is proportional to the square of network density.
An additional motivation driving development of sDNA is the desire to base network analysis on route choice and spatial cognition grounded in human behaviour [29,30].This is reflected in a wide variety of distance metrics available to use in sDNA (with even more available in its proprietary relative sDNA+ [31]).

Software description
sDNA is an ArcGIS/QGIS/AutoCAD plugin and Python/ command line tool for 2-d and 3-d network analysis.sDNA computes various measures of reach, including weight, junction and link count, and length; mean distance (the inverse of closeness), bidirectional betweenness (weighted either by product or a variant of the Huff [32] model without distance decay, as the latter can be handled using multiple radii), circuity [14,15] and geometric properties of the convex hull of the radius.Crucially, these can all be localised within lower-and upper-bound network-Euclidean radii and weighted by user defined expressions based on link geometry, zoning systems, origin-destination matrices or network attached data (the latter giving a means to import building data via GIS join).Once a localised network around each link has been defined, different distance metrics are available for its analysis: angular (least directional change), topological (fewest junctions), custom, cyclist-specific (based on slope and aversion to motorised traffic) and an angular-Euclidean mixture.(A greater variety of distance metrics and radius types, including 'hybrid' metrics based on user defined expressions incorporating angular change, height change, distance and custom data, and Monte Carlo randomization to handle individual preferences and analysis of regular grids, are available in the related proprietary software sDNA+ [31]).Networks can be read in 3-d, and turn angles are measured in 3-d so include turns 'uphill' and 'downhill' (Fig. 2); this is necessary in 3-d analysis otherwise it is possible for a geodesic to 'cheat' angular analysis by turning in the 3rd dimension without accumulating angular distance.Where analytical geodesics exceed the size of the radius, sDNA can detect and correct 'problem routes' [13].sDNA also includes tools for network preparation, provides a user-friendly means to use R for statistical inference and prediction from network data, and can output shapes of geodesics, accessibility maps, network buffers, convex hulls and flow bundles in addition to networks and skim matrices.
In practice, each link in a network is represented by one or more polyline objects in GIS or CAD (Fig. 1); our guiding principle is to compute statistics for the midpoint of each polyline and take them to be representative of the polyline (with midpoint defined by Euclidean or angular metric as appropriate).Where polylines form only a partial link, two approaches are available to the user: (1) to combine these into a single polyline using sDNA's Prepare tool; (2) to split the polyline into multiple parts as desired and compute statistics for each part separately, giving increased spatial accuracy albeit with more computational cost.If, in the latter case, the user is still interested in weighting the analysis by link, the ''Link'' weighting type will automatically apply partial weighting to partial links.
A further feature currently unique to sDNA is the extension of dual representation [16] by considering each direction on a link as a separate node, thereby allowing links with asymmetric distance metrics.At urban and wider scale this is useful for one way streets, at smaller pedestrian scale this feature can be used for escalators (Fig. 3) [33].
The issue arises of how to handle polylines which fall only partly within the radius of analysis.sDNA offers two modes [25]: • Discrete space, in which the entire polyline is included/ excluded depending on whether its midpoint falls within the radius; • Continuous space, in which a partial polyline is included up to the point where the radius cuts the polyline, with metric distance along the polyline reduced as appropriate (partial polyline geometry is computed), and weighting scaled down as a function of partial length.This requires slightly more compute time, but is particularly useful in handling cases where long polylines would, in discrete space mode, create significant irregularities in analysis of small radii.
A further consequence of polylines occupying more than a single point in space, is that they will exhibit self-closeness and self-betweenness; properties which do not arise in aspatial networks [29].The former arises because the average distance between all pairs of points on the polyline is nonzero; for a polyline of length L the distance is in fact 1 Self-betweenness of a polyline is derived by considering the mean betweenness of all points x on the line of length L, with origin weight W o and destination weight W d both distributed evenly over its length.Each point x is passed in one direction only by geodesics from origins o : 0 < o < x with total weight x L W o , to The self-betweenness contribution in both directions is therefore Additionally, the polylines which form the endpoints of a geodesic will themselves experience some degree of betweenness: on average, considering geodesics from all points on the polyline to one of its ends, an arbitrary point on the line will be passed by half of them.
sDNA includes a network preparation tool which can detect and/or fix the following, while preserving data attached to the network: (1) near miss connections, where link endpoints are almost but not quite coincident.This is essential for interface with some GIS which will display differing points as coincident if they fall below tolerance settings; (2) polylines which form only a partial link; (3) disconnected portions of the network; (4) duplicate links; (5) traffic islands (intended to correct features in some road data which encode spurious angular change).
Following feedback from urban design/transport partners, sDNA was updated to incorporate the Learn and Predict statistical modelling tools allowing for creation of predictive models based on network characteristics.As accessibility statistics are often strongly cross-correlated, a regression method robust to multicollinearity is required; for this reason ridge regression is employed with n-fold cross-validation to tune the ridge penalty, and bootstrapping to increase stability of results.A range of weighting schemes allow minimization of absolute or relative error terms, or a mixture of both.Model fit is reported both using weighted r 2 and the GEH statistic popular in transport planning ( [34], section 3.2.7).
Table 1 shows the primary analysis outputs computed by sDNA version 4. Table 2 shows additional outputs of analysis, and Table 3 the outputs of the network preparation, learn and predict tools.

Architecture
Architecture is shown in Fig. 4. The network backend uses parallelization to process multiple origins simultaneously.Typical usage is via the QGIS or ArcGIS toolbox; a more limited interface is provided for Autocad due to its more limited data handling capabilities.As most sDNA users are not themselves programmers, an instance of R-portable [35] is included in the distribution to make use of relevant R libraries, and this is called from the front end tools without users needing to program in R. The sDNA source includes a suite of automated system level tests which are run from the Debug configuration of the Visual Studio project.

Illustrative examples
Betweenness can be used to approximate a transport model (see [30] for discussion); in the case of motor vehicles, angular betweenness can proxy a model based on travel time [36,37] as straight routes through cities tend to have priority and therefore offer quicker travel times to drivers.Fig. 5 shows angular betweenness with 8km radius (maximum geodesic length) for the road network of Cardiff, the capital city of Wales, with 479,000 inhabitants living in a 75 km 2 urban area.The network data is derived from Open Street Map (OSM) and contains approximately 23,000 links, displayed here using standard GIS tools.The following command was used for the network analysis: d:\example> sdnaintegral.py--im "net=network" --om "net=outputs/output_ang" "metric=ANGULAR;radii=2000,8000" Fig. 6 shows an example of using 3-d properties of links in analysis by displaying cyclist roundtrip metric per unit length (ignoring motorised traffic).This metric is derived from calibration in [38] and can be used in any of sDNA's accessibility and flow computations.The following command was used to compute metrics for individual links only (setting t = 0 to ignore traffic, and afterwards computing HybridMetricForward/Length): d:\example> sdnaintegral.py--im "net=network" --om "net=outputs/output_crt" "metric=CYCLE_ROUNDTRIP;linkonly;t=0" Fig. 7 shows the importance of correct use of one-way link data in vehicle transport models -without it, dual carriageways are not correctly modelled (with the further implication for cycling models based on these vehicle flows, that one side of a dual carriageway is always empty and hence forms an attractive cycle route).The following command was used to produce the corrected version (using the sdnaoneway and wt fields on the input network, the latter to restrict flows to both directions between a single origin/destination pair): d:\example> sdnaintegral.py--im "net=oneway_prep" --om "net=outputs/output_oneway" "metric=ANGULAR;weight=wt;oneway=sdanoneway" Fig. 8 shows the difference between using continuous versus discrete space in a betweenness analysis weighted by length.Under discrete space mode, the long link generates high betweenness even though the radius of analysis is shorter than the link.
Further examples, along with data and batch file of command line calls to sDNA to generate output, are provided as supplementary material.These include a distance map from a single origin, different betweenness types at different radii, mean angular and mean euclidean distance, a 'flow bundle' of flows through a specific link, geodesics between specific origins/destinations, convex hull and network radius output.
Table 4 compares compute times for angular and network-Euclidean betweenness on the above network, between sDNA and the competing software described in Section 1.Note that each tool has a unique feature set and for many tasks these will take precedence over compute times as a basis for choice of tool.For OSM links, which may either be straight or curved, this comparison is a test not only of code efficiency but also of a theoretical approach which allows processing of curved links as a single unit.To provide a comparison based on processing speed alone, a simplified network is also tested in which all links are straightened (except where corners are required to preserve topology) and pre-broken into straight segments.For the straight line network tested, PST and DepthmapX are faster than sDNA while UNA is slower.For the OSM network in which many links are curved, the advantage of treating each link as a single entity becomes apparent as DepthmapX and PST, without this feature, must break the set of links into a set of straight segments 7-10 times the size before processing.DepthmapX in particular shows much longer compute times for curved links.The performance of PST benefits from highly optimised code (and like sDNA, parallel execution); its speed falls between those of sDNA Discrete and Continuous for network Euclidean analysis at a 3km radius, but is slower than either for angular 3km.For a 6km radius this     Academic applications of sDNA have included predicting flows and mode choice in transport networks: vehicle [39], metro [40], pedestrian [30,33,[41][42][43][44] and bicycling [29,38].Spatial network analysis shows especial promise as a technology for modelling slow, active and sustainable modes of transport, for which traditional zone-based transport models have been unable to model details of trips within single zones, and feedbacks between transport and land use, in a cost-effective manner.This has allowed answering questions on the effect of infrastructure on active travel choices, and application of results in the public realm.Much of this has been through the customised forms of betweenness analysis offered by sDNA, however, the more novel convex hull based metrics we introduced have also been shown to have high correlation with flows of pedestrians [42].Further applications include land use [45] and planning [46][47][48]: GIS arguably provides a lingua franca bridging the gap between planning and transport disciplines and improving spatial network analysis capability within this context has furthered study of the relationship between accessibility and land use, and research on quantification of urban plan quality.Another substantial field of application is health informatics, where sDNA has allowed quantification of built environment characteristics in epidemiological models, sometimes on a ''big data'' scale (sDNA outputs are used in UK Biobank [49]), enabling new research into the effects of built environment on health [50][51][52][53][54] ( [54] won a Royal Town Planning Institute award for research excellence in 2019, and [55] was shortlisted for the same).[50] showed a link between convex hull statistics and social cohesion.A final, unexpected development is the use of sDNA in archaeology [56].In all the above fields, we hope that by standardizing on the network link as a fundamental unit of analysis, in contrast to segments or axial lines used in previous research (which can be defined in multiple ways), more repeatable results might be obtained in the long term.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: The authors are entitled to receive a small share of revenue from the proprietary sister project to sDNA, sDNA+.AC reinvests all revenue.

Fig. 1 .
Fig. 1.Distinction between Links and Polylines in sDNA.Dots illustrate where polylines end for the purposes of this diagram.In practice sDNA does not need a point (node) layer but only requires polylines as inputs.No lines in this diagram overlap.

Fig. 4 .
Fig. 4. sDNA software architecture.Components shown in bold represent front-ends/interfaces exposed to the user.

Fig. 6 .
Fig. 6.Cyclist roundtrip metric per unit length (excluding deterrence from motorised traffic) in city of Cardiff.Links with high metric on slopes illustrate use of 3-d properties of links.Displayed in ArcScene with vertical exaggeration = 5.

Fig. 7 .
Fig. 7. Illustration of correct use of one-way links.Ignoring one-way data (left), angular betweenness flows used to approximate vehicle traffic will fail to correctly use dual carriageway.Correct flows shown on right.

Fig. 8 .
Fig.8.Comparison of discrete and continuous space modes: in discrete space mode the long link appears to generate high levels of betweenness, as it exceeds the length of the radius.In continuous space a partial link is considered.In this case the long link has been drawn in unrealistic zigzag fashion in order to include a long link without the diagram exceeding the page width; however, similar ratios of link length are common in real spatial networks.

Table 1
Primary network analysis outputs (formatted as polylines) from sDNA software.
a Denote output duplicated over multiple radii.

Table 2
Further analytical outputs from additional software in the sDNA toolbox.
a Denote output duplicated over multiple radii.

Table 3
Outputs of Prepare, Learn and Predict tools.

Table 4
Comparison of times to compute betweenness with different software, for Cardiff model on Intel i7-4810MQ, 2.8 GHz, 4 cores, 8 threads, 32GB RAM.net-Euc = network-Euclidean; n/a = not applicable (UNA does not offer this output); n/t = not tested.
and uptake outside of academia.Prior to open source release, 1400 active installations were known to be in use; the sDNA QGIS plugin alone now records over 15,000 downloads with a rating of 4/5 stars.The user base can generally be considered experienced in using GIS or CAD software to load or input and display data, and perhaps experienced modellers and/or statisticians, but not necessarily having programming experience.sDNA has been used to deliver numerous sustainable transport projects