Geo-Engineering Data: Representation and Standardisation


David Geoffrey Toll

School of Engineering, Durham University, UK


The paper addresses current issues of representation of geo-engineering data using XML (eXtensible Markup Language) and the evolution of data standards in this field. Suggestions are made for ways to bring greater consistency to the common areas of the schemata that have been proposed, particularly focussing on ground investigation or “borehole” data.

It is proposed that “Borehole” should be the agreed name for a borehole/hole used to identify soil or rock strata that can be represented as Intervals and that “Interval” should be the agreed name for a soil or rock layer that can be defined by a top and base depth in a one dimensional borehole. A set of tags is proposed to represent the properties of “Borehole” and “Interval” objects.

Keywords: geo-engineering data; extensible markup language (XML); data exchange; World Wide Web.


The paper addresses current issues of representation of geo-engineering data using XML (eXtensible Markup Language) and the evolution of data standards in this field. The concept of creating a geotechnical version of XML was first proposed by Mete Oner and the World Wide Web of Geotechnical Engineers ( in 1998. Since then there have been a number of initiatives to develop representation schemes, both for geo-engineering and for geo-science data. It is likely that the development of standard forms of representation will continue for some time. The great power of XML is its built-in extensibility, which will allow data standards to continue to evolve to meet the needs of geo-engineering professionals.

XML allows simple text files to be 'marked up' by including 'tags' within the file. These tags can be recognised by an XML compliant web browser. XML is being widely adopted by web developers for producing the next generation of web-based materials ( XML is a more generic form of mark-up language than HTML (Hyper-Text Markup Language), which has been the main language used on the World Wide Web. HTML is purely a display language that allowed tags to be introduced to define how the text would be formatted for display within a web browser. XML allows the tags to be user defined. This means that the tags can be used to give meaning to the contents of a file; for instance data can be marked up using ... tags to indicate that all data between these tags relates to borehole information.

It will be possible to use XML tags in order to search for files on the World Wide Web using XQuery ( This will make web-based searching much more productive and focused, rather than the keyword searching options that are currently available. However, if different data standards are adopted by different countries, the facility of being able to search easily for data anywhere in the world will be nullified.

For this reason, the three international geo-engineering societies (International Society for Soil Mechanics and Geotechnical Engineering (ISSMGE), International Association for Engineering Geology and the Environment (IAEG) and International Society for Rock Mechanics (ISRM)) have formed a Joint Technical Committee, JTC2 ( JTC2 will oversee the development of an internationally agreed form of representation of geo-engineering data that can be used to store such data on the World Wide Web and transfer data between computer systems. This will ensure that geo-engineering data is stored in the same format anywhere on the web. The remit of the JTC2 committee covers:

Ground investigation (“borehole”) data

Commonly-used laboratory test data

Commonly-used insitu test data

Data about geotechnical entities such as foundations, retaining structures, slopes, dams, embankments and tunnels

There are other benefits to having an internationally agreed data standard apart from allowing data to be made available on the World Wide Web. XML files can also be used for data exchange between organisations and computer systems. It could also be used as a file format for importing or exporting data to or from other software packages such as databases, GIS systems or analysis packages (Toll, 2001). It is hoped that developers of geo-engineering software will see the benefits of reading their data from a standard file format, rather than each analysis package having its own file format. This would mean that the same file structure could be used for a slope stability analysis, a retaining wall analysis, a finite element analysis and so on.

This paper reviews the current initiatives for defining standard forms of representation of geo-engineering data. The data structures of some of these schemes are identified. Suggestions are made for ways to bring greater consistency to the common areas of the schemata that have been proposed, particularly focussing on ground investigation or “borehole” data.



This project (Data Interchange for Geotechnical and Geo-environmental Specialists) is a collaboration between the Federal Highway Administration (FHWA), United States Environmental Protection Agency (US EPA), US Army Corps of Engineers, US Geological Survey (USGS), Eastern Federal Lands Highway Division (EFLHD) and a number of Departments of Transport in USA, funded through the Transportation Pooled Fund. The UK Highways Agency is also a collaborator. DIGGS brings together existing standards developed by Association of Geotechnical and Geoenvironmental Specialists (AGS) in the UK (, Consortium of Organizations for Strong-Motion Observation Systems (COSMOS) ( and the University of Florida, Department of Civil Engineering ( (Styler et al, 2007). The existing schemes have been described by Swift et al (2004), AGS (2005), McVay et al (2005) and Chandler et al (2006).


The International Organisation for Standardization (ISO) released draft versions of Technical Specifications 14688: Part 3 and 14689: Part 2 in 2004 (ISO, 2004 a,b). These were prepared by ISO Technical Committee ISO/TC 182, Geotechnics, Subcommittee SC1, Geotechnical investigation and testing. They describe an XML schema for exchanging data about the description of soils and rocks. Both draft standards reached the Committee stage (i.e. preparatory work complete) at the end of 2004, but have not progressed since.

GeotechML (

This represents work by the Author’s group at Durham University, UK. Preliminary work has been done on ground investigation data, foundations, retaining walls and slopes. Demonstration style sheets are available for borehole logs and retaining walls and a number of Java programs have been developed for displaying ground investigation data, foundations and slopes. Toll and Cubbitt (2003) have described how geotechnical entities (e.g. foundations, retaining walls and dams) could be represented in GeotechML. Toll and Shields (2003) described a scheme for ground investigation data.


This is an XML data structure for representing geotechnical asset data. The work was funded by the UK Highways Agency. It has been used to represent data from earthworks, including defects and problems. It supports field input of data using a Pocket PC.

eEarth (

This European funded project links the Geological Surveys of the Netherlands, Germany, United Kingdom, Czech Republic, Lithuania and Poland together with Geodan of the Netherlands and Golder Associates of Italy (Tchistiakov et al, 2005). The project aims to increase availability, use and distribution of the European digital Earth subsurface data by providing cross-boundary access for the digital geo-environmental collections in different EU languages and to develop cross-border geo-information services based on public geodata stored in the national geodatabases. The project is now completed and the website provides a portal for access to borehole information across Europe.


XMML (eXploration and Mining Markup Language) is aimed at geoscience and exploration information. The XMML implementation is based on Geography Markup Language (GML). XMML itself will be standardised through the IUGS Commission on Geoscience Information ( The primary sponsors of the project are Fractal Technologies, CSIRO and the Minerals and Energy Research Institute of Western Australia. Other partners are from Australian government institutions and mining companies and the British Geological Survey.

GeoSciML (

The GeoSciML project operates under the auspices of the CGI working group on Data Model Collaboration. GeoSciML has the short-term goal of representing geoscience information associated with geologic maps and observations, as well as being extensible in the long-term to other geoscience data. The borehole model for GeoSciML has been adopted from XMML.

ATC10 (

ISSMGE Asian Regional Technical Committee (ATC10) on Urban Geo-informatics is an initiative to bring together geo-engineering data from the major urban centres in Asia and to deal with methods for compiling geo-information.


ISRM Commission on Case Histories in Rock Engineering (CCHRE) plans to collect and document case records for landslides, rock structures, earthquake engineering etc. It is now planning an initiative to develop a Database on Geohazards and Rock Engineering.


NEES (Network for Earthquake Engineering Simulation) has been developing a data model for representing centrifuge test data (NEES, 2006). The model is based on work by Peng and Law (2005) and a geotechnical data model by Bardet/Swift/Kutter/Wilson, extending the proposal by Kutter et al (2002). NEES is funded by the US National Science Foundation. The focus is on representing the experimental facilities and metadata associated with experimental research.

International Journal of Geo-Engineering Case Histories (

This is a new on-line journal whose mission is reporting case histories, but also has interests in archiving geotechnical data in electronic form.

SlopeSML (

This is work done at Istanbul Technical University, Turkey to define an XML schema for storing case histories of slopes (Hatipoglu, 2003).

RockProp (

RocProp is a database implemented in XML that contains information on intact rock parameters and rock type descriptions. Parameters included in the database include rock type, density, modulus of elasticity, modulus of rigidity, Poisson’s ratio, compressive and tensile strengths. The development of RocProp was jointly funded by Rocscience Inc. and the Lassonde Institute, University of Toronto. The first version of RocProp was described by Turichshev (2002) but further enhancements have been added since.


TUNCONSTRUCT is a European project funded under Framework 6 that promotes the development and implementation of European technological innovation in underground construction, with 41 partners from 11 European countries. One of its aims is to promote the exchange of information by providing a web-based data system that can supply relevant and reliable information throughout the lifetime of an underground facility. Visualization of geological data using augmented virtual reality will also be implemented.


The Construction Industry Research and Information Association (CIRIA) in the UK have carried out a review of electronic file formats for the exchange of geotechnical information used in transportations schemes. The work was funded by the UK Highways Agency and had a Steering Group with broad representation from UK construction industry. As a result of the review they have identified a need to develop data transfer formats for geo-asset data and construction data.


The first proposal for a geotechnical data structure was given by McPhail (2001) within the Geotechnical XML (GML) project ( (Note that GML is now used as the name for Geography Markup Language ( McPhail suggested splitting data into Office/Field/Laboratory objects (Figure 1). This approach has not been adopted by other systems that have developed subsequently.


Figure 1. Geotechnical XML structure (McPhail, 2001)


Toll and Shields (2003) proposed a data structure for ground investigation data (Figure2) based on the AGS data exchange format (AGS, 1999). This also included higher level objects for representing Entities, Construction and Monitoring as proposed by Toll and Cubitt (2003). The ground investigation part of the hierarchy has many similarities with the COSMOS or AGSML structures proposed later (Swift et al, 2004; AGS, 2005; Chandler et al, 2006), as would be expected since the starting point in all cases was the existing AGS data exchange format.


Figure 2. GeotechML structure for ground investigation data (Toll and Shields, 2003)


The entire data structure is not shown in Figure 2 to avoid clutter. The Layer object is further refined by Material and Constituent objects that provide a structured representation of the soil or rock description (Toll and Shields, 2003).

ISO (2004a,b) proposed forms of representation for soil and rock descriptions (Figure 3). This therefore forms a detailed structure for lower level objects of a larger geo-engineering hierarchy. For instance, these Soil/Rock objects could be used for soil and rock descriptions attached to the Layer object proposed by Toll and Shields (similar to their use of Material). The PrincipalFraction and SecondaryFraction objects are similar to the Constituent object used by Toll and Shields.


Figure 3. ISO structure for soil and rock descriptions (ISO, 2004a,b)


The eEarth project deals with representing borehole data from the databases of European Geological Surveys. The main data structure is very simple (Figure 4). The top level object in Borehole and this is divided into Intervals (this would be equivalent to Layer in Figure 2). The data describing each object is shown in Figure 4. The rockname, stratigraphy, lithology and genesis codes describe each layer (or Interval).


Figure 4. eEarth structure for borehole data (Jellema et al, 2004)


The data structure proposed by AGS for ground investigation data is shown in Figure 5. This is similar to the Toll and Shields proposal in terms of structure but used the 4-digit AGS codes rather than giving the objects more meaningful names. The AGSML structure was developed as a Geography Markup Language (GML) application and adopted the convention of alternating element-property-element links in the hierarchy. In the AGSML structure, projects, holes, hole information, laboratory testing were defined as properties to link the elements SiteInvestigation, Proj, Hole, Geol etc. Note that the Geol element is equivalent to Layer in the Toll and Shields proposal.


Figure 5. AGSML data structure (AGS, 2005)

The DIGGS data structure is shown in Figure 6 (Styler et al, 2007). The part of the data structure dealing with ground investigation (Hole and the hierarchy below this) was largely constructed by combining the AGSML and COSMOS structures, but using more meaningful object names than was used in AGSML. Figure 6 does not show the child objects below Sample (all the laboratory test information) because of lack of space, but this lower structure is similar to AGSML. The FoundationGroup object and the structure below it draws on the University of Florida scheme (McVay et al, 2005) and represents piled foundations.


Figure 6. DIGGS data structure (DIGGS, 2006; Styler et al, 2007)


It should be noted that DIGGS have adopted the AGSML definition of Hole as a “Hole or Location Equivalent”. It can be used to represent a conventional borehole, but can also be used for trial pits, CPT probes or as a location for monitoring data etc. The implications of this are discussed later.


The JTC2 committee is interacting with the projects described earlier to ensure that the schemes under development are consistent. Consistency within the geo-engineering field is essential. Ideally, it would also be helpful if geo-engineering schemes were also compatible with other geo-science schemes under development (such as eEarth and GeoSciML). While the lower (more detailed) parts of any data structures are likely to diverge (since the amounts and types of data stored will be different for each interest group), it would be helpful if the high level objects are compatible.

The following discussion relates only to “borehole” or ground investigation data. This is the most developed area of geo-engineering data representation and also has common elements with geo-science initiatives.

The common elements between a number of schemata are Borehole (or Hole/Borings) and Interval (or Layer/Geol). In the Geo-Engineering schemata (e.g. Figure 2, Figure 5, Figure 6) the Borehole or Hole objects are child objects of Project (or Proj). Exploration geologists (XMML), use the concept of Site (i.e. a geo-spatial entity rather than an organisational entity). However, general geo-science organisations (such as Geological Surveys) treat boreholes as independent entities. They are assigned locational and ownership tags, but are not necessarily grouped in any organisational or geographical way.

It is suggested that Borehole should be the agreed name for a borehole/hole used to identify soil or rock strata that can be represented as Intervals (see later for definition of Interval). This would help to separate Boreholes (that can be represented by a conventional borehole log) from Trial pit or Trial trench data (which could have a 2D representation of the faces) or a CPT probe (or similar) that has a quantitative profile with depth (Note: these different types of investigation are all represented by Hole in the AGSML and DIGGS schemata). This separation will be helpful for users that are searching for a particular form of data. A Borehole could still be defined as an abstract Hole type, but should be given the full name for compatibility with other schemes.

Borehole as an object should also be complete in itself. It should contain sufficient geo-spatial and organisational data (location, ownership etc) to allow it to stand alone. Schema developers should avoid having such data inherited from a parent object (Project or Site) that would be lost if the Borehole data were removed from its original structure.

It is suggested that Interval should be the agreed name for an entity that can be defined by a top and base depth in a one-dimensional borehole. This might be a defined soil or rock stratum identified by boring, or from a length of rock core, that has a common geo-engineering, lithological or stratigraphic description. The term Layer is an entity that can be represented in 2 (or 3) dimensions (i.e. an Interval is a 1D representation of a Layer at a specific location).

A comparison is made between the tags used for Borehole and Interval data by eEarth, DIGGS and XMML in Tables 1 and 2. The final column shows a set of proposed tags that would provide a compromise set that would provide sufficient information for most users. Adopting these tags would not prevent schema developers from adding additional tags to provide extra or more detailed information.

Of particular concern is the locational data. XMML makes full use of GML and defines a single entity “begin” which is a GML (Geography Markup Language) “point” construct (using 3D coordinates). In comparison, eEarth and DIGGS provide facility to define coordinates and ground levels independently and in different forms. It is proposed that there is a single “location” entity that would use a gml point construct defining the reference coordinate system used and the 3D coordinates of the borehole (at the ground surface). This data would be sufficient for most purposes. There is no objection to schema developers providing additional coordinate/level definitions to suit different purposes, provided the default “location” data is provided.



Table 1. Borehole tags

Description EEarth DIGGS XMML Proposed XML tag
Borehole Identifier borehole id gml:id gml:id gml:id
Description nameBorehole gml:description gml:description gml:description
Borehole Name nameBoreholeShort gml:name gml:name gml:name
Country country     country
Language language     language
Owner of data OwnerOrg roles   <roles role="Owner" organisationOrIndividual = "A.N.Owner">
Purpose purpose purpose   purpose
Status   status   status
Location gml:point srsName geodeticCoordinateSystem
(Project level)
(Project level)
begin <location> <gml:point srsName = "urn:EPSG:geographic
<gml:pos>100.1 150.2 23.5 </gml:pos>
levelReference referenceDatumDescription geodeticVerticalDatum (Project level) localVerticalDatum (Project level)
levelGroundSurface GroundLevelGeodetic Elevation
referenceLocationGeodetic Elevation
Geometry   holeGeometry end <holeGeometry>
<gml:pos>170 90 34</gml:pos>
90 20</gml:pos>
Drilling method drillingMethod type drillMethod drillMethod
Start date of drilling drillingYear dateTimeStart
(HoleConstruction level)
End date of drilling   dateTimeEnd
(HoleConstruction level)
Depth at start of recorded drilling drillingStartPoint depthTop
(HoleConstruction level)
begin depthStart
Depth at end of drilling depthFinal depthBase
(HoleConstruction level)
end depthEnd
Hole diameter
(at top)
    collarDiameter diameter
Logger creator roles   <roles role="Logger"
organisationOrIndividual = "A.Logger">
Date when logged date     dateLogged
Specification for logging standard specificationID   specificationID



Joint Technical Committee 2 of ISSMGE, IAEG and ISRM is interacting with a number of projects developing XML schemata for geo-engineering data to ensure that the schemes under development are consistent. Consistency within the geo-engineering field is essential. Ideally, it would also be helpful if geo-engineering schemes were also compatible with other geo-science schemes under development.

It is proposed that:

A set of tags is proposed to represent the properties of Borehole and Interval objects. The proposal provides a compromise set of tags that would provide sufficient information for most users. Adopting these tags would not prevent schema developers from adding additional tags to provide extra or more detailed information.


  1. AGS (1999) Electronic Transfer of Geotechnical and Geoenvironmental Data (3rd Edition), Association of Geotechnical and Geoenvironmental Specialists, Beckenham, Kent (also available at:
  2. AGS (2005) Electronic Transfer of Geotechnical and Geoenvironmental Data using XML data format, Association of Geotechnical and Geoenvironmental Specialists, Beckenham, Kent (
  3. Chandler, R.J., P.M. Quinn, A.J. Beaumont, D.J. Evans, and D.G. Toll, (2006) Combining the Power of AGS and XML: AGSML the Data Format for the Future, Proc. GeoCongress 2006 (eds. D.J. DeGroot, J.T. DeJong, J.D. Frost, L.G. Baise), Reston: American Society of Civil Engineers, pp. 1-6.
  4. DIGGS (2006) Data Interchange for Geotechnical and Geoenvironmental Specialists, Technical Manual Draft Version 0.8 July 24, 2006.
  5. Hatipoglu, B. (2003) The Development of an Integrated and Intelligent Design Environment for the Investigation of Slope Stability Problems, PhD Thesis, Istanbul Technical University.
  6. Styler, M., M. Hoit and M. McVay (2007) Deep Foundation Data Capabilities of the Data Interchange for Geotechnical and Geoenvironmental Specialists (DIGGS) Mark-up Language, Electronic Journal of Geotechnical Engineering (Ibid).
  7. ISO (2204a) Electronic Exchange of Data of Identification and Description of Soil, Technical Specifications 14688: Part 3, Geneva: International Organisation for Standardization.
  8. ISO (2204b) Electronic Exchange of Data of Identification and Description of Rock, Technical Specifications 14689: Part 2, Geneva: International Organisation for Standardization.
  9. Jellema, J., A. Tchistiakov, D. Lowe, R. Bowie, H. Preuss, T. Stych, D. Capova, T. Mardal, and J. Belickas (2004) The Inventory on Data Models and Lithology Standards, eEarth Work Package 4, delivery 4.1, Volume 1. Final report (
  10. Kutter, B.L., D.W. Wilson and J.P. Bardet (2002) Metadata Structure for Geotechnical Physical Models (and Simulations?), .
  11. McPhail, J.D. (2001) Electronic Storage and Interchange of Geotechnical Data,
  12. McVay, M., M. Hoit, E. Hughes, T. Nguyen, and P. Lai (2005) Development of a Web Based Design, and Construction Bridge Substructure Database, Presented at 84th TRB Annual Meeting, January, 2005, Washington, D.C. (
  13. NEES (2006) NEES Data Model, June 20, 2006 (
  14. Peng, J. and K. H. Law (2005) Reference Data Models for Supporting the Network for Earthquake Engineering Simulation (NEES), Proc. ASCE Int. Conf. Computing in Civil Engineering, Cancun, Mexico, July, 2005 (
  15. Swift, J., J. Bobbitt, C. Roblee, J. Futrelle, S. Tiwana, A. Peters, J. Castro, M. Ali, F. Nasir, A. Javed, Y. Khan, and C. Stepp (2004), Cosmos/Peer Lifelines Geotechnical Virtual Data Center, COSMOS Workshop 1,10/15/04. (
  16. Tchistiakov, A., J. Jellema, H. Preuss, T. Hernandez Diaz, B. Cannell, J. Passmore, T. Mardal, D. Capova, J. Belickas and V. Rapsevicius (2005) eEarth: Bridging the Divided National Geo-Databases via Multilingual Web Application, 10th International Symposium on Information and Communication Technologies in Urban and Spatial Planning and Impacts of ICT on Physical Space, Vienna University of Technology, February, 2005 (
  17. Toll, D.G. (2001) Computers and Geotechnical Engineering: A Review, in Civil and Structural Engineering Computing: 2001 (ed. B.H.V. Topping) Stirling, Scotland: Saxe-Coburg Publications, pp 433-458.
  18. Toll, D.G. and A.C. Cubitt (2003) Representing Geotechnical Entities on the World Wide Web, Advances in Engineering Software 34, pp 729-736.
  19. Toll, D.G. and R. Shields (2003) A Web-Based Data Format for Ground Investigation Data, Electronic Journal of Geotechnical Engineering,
  20. Turichshev, A. (2002) A Web-accessible Database for Intact Rock Properties and a XML Data Format for Intact Rock Properties, M.A.Sc. thesis, Department of Civil Engineering, University of Toronto.


© 2007 ejge