On-site geometric calibration of RPAS mounted sensors for SfM photogrammetric geomorphological surveys

The application of structure from motion (SfM) photogrammetry for digital elevation model (DEM) and orthophoto generation from visible imagery enjoys ever-growing popularity in geomorphological research. Photogrammetry experts, however, urge that a rigorous approach is a prerequisite for reliable results — a requirement that may conflict with real-world survey. We present a method that unites the two disciplines, using the example of a challenging SfM photogrammetric survey at a Scottish river. Using simultaneous geometric pre-calibration of a multi-sensor remotely piloted aircraft system (RPAS), the method facilitates time-efficient topography mapping and the integration of other wavelengths to create orthophotos providing additional surface information. The approach utilizes an on-site 3D structure — for example, a building, as calibration object, by extracting coordinates of natural features from lidar scans and sensor imagery. We assess the workflow with specialized calibration software (VMS) and widely applied commercial SfM photogrammetric software (AM), using a DJI Phantom optical and a Workswell thermal sensor. We achieved calibration accuracies below one-third (optical) and one-quarter (thermal) of a pixel. Subsequently, we transfer the sensor parameters to pre-calibrate the SfM application and compare the results to a self-calibrated workflow. In a systematic experiment using the optical river survey dataset, we assess the effectiveness of pre-calibration, oblique imagery, scale variation and masking to mitigate systematic DEM errors. Opposing trends show between the calibration strategies. Decreasing network complexity (i.e., flying heights/view angles) improves pre-calibrated but compromises self-calibrated scenarios. Pre-calibrating (VMS) imagery from a single height (30 m nadir) yielded the best results. This finding could have implications for geomorphological surveys, in which single-scale datasets are widespread practice, despite the literature ’ s urge towards more complex imaging networks. The self-calibrated results legitimise this insistence: The same dataset resulted in pronounced dome-shaped DEM distortion, indicating systematic errors, whereas additional flying heights and angles significantly improved the results.


| INTRODUCTION
The rise of user-friendly photogrammetric applications (Chandler, 1999) over the last 30 years and the availability of off-the-shelf remotely piloted aircraft systems (RPAS) has led to a democratization of photogrammetry. Integration of computer vision and machine learning into photogrammetric workflows has opened this technology to a wider range of users. What used to be an expensive technology reserved for skilled experts has evolved into a tool requiring little prior knowledge (James et al.2019).
Additionally, geomorphological applications (e.g., Eltner et al., 2021) often adopt several sensors with complementary spectral properties for orthophoto generation to be used for surface mapping and image classification. Thermal imagery can provide valuable information about surface properties but is less suitable for mapping topography because of its low dynamic range and image resolution (Javadnejad et al., 2020;Maes et al., 2017). The different reflective properties result in inconsistent levels of contrast between visible and thermal images: for example, on overgrown areas borders appear less pronounced, making image matching challenging. There are manifold examples for use cases of multi-sensor RPAS: Vegetation indices are used for ecosystem (e.g., Antarctic moss: Lucieer et al., 2012) or crop (e.g., vineyards, beets, forest: Maes et al., 2017;Pádua et al., 2020) monitoring. Erenoglu et al. (2017)  Several recent publications (e.g. Eltner & Sofia, 2020;Peppa et al., 2019;Remondino et al., 2017) have pointed out that a certain level of expertise and process understanding is required to apply SfM photogrammetry appropriately. The ease of access to this technology allows results to be generated quickly and can create the illusion that the produced products are meaningful. In this context, authors such as James et al. (2019) and Remondino et al. (2017) denounce the reliability of some recent studies in the field of geomorphology. As a consequence, recent photogrammetric publications have highlighted the importance of rigorous approaches and data quality assessment.
These articles propose guidelines and suggestions for good practice in the application of SfM photogrammetry in geomorphology (e.g., Eltner & Sofia, 2020;James et al., 2019). Survey design and imaging network geometry play a major role in this. Classical photogrammetry from manned aircraft applied pre-calibrated metric sensors with high geometric stability. Nadir imaging-that is, cameras facing vertically downwards, from a uniform flying height-was common practice. However, such a classical approach does not directly translate to working with RPAS. The rigorous application of non-metric, off-the-shelf imaging sensors in modern SfM photogrammetry applications is founded on the incorporation of self-calibrating bundle adjustment (also referred to as 'on-the-job self-calibration'). This process simultaneously determines the internal (lens and sensor geometry) and external (position and orientation) sensor parameters. The predominant approach to determine the external sensor parameters uses ground control points (GCPs)-that is, visible features of known coordinates. Typically, these are placed targets or permanent landmarks for which position is acquired with centimetre accuracy using survey equipment-for example, global navigation satellite system (GNSS) (real-time kinematic (RTK) or post-processed) or total stations. Sufficient number, distribution and accuracy of GCPs are key to mitigate errors in scaling, rotation and translation (Carbonneau & Dietrich, 2017). Alternative approaches apply direct georeferencing using the orientation and corrected (RTK or post-processed) GNSS position of the sensor at the time of image acquisition (Carbonneau & Dietrich, 2017). The bundle adjustment can reach its limits if the geometric networks are insufficiently rigorous-for example, exclusively parallel or nadir view directions and flat surface geometry (Griffiths & Burningham, 2019). Unsatisfactory bundle adjustment has been demonstrated to result in systematic camera calibration errors that cause dome or bowl-shaped digit5al elevation model (DEM) deformations (e.g., James et al., 2020;Sanz-Ablanedo et al., 2020).
When non-metric sensors are applied, a rigorous survey network design can mitigate systematic error. Therefore, data acquisition must be optimized for convergent image geometry by including sufficient overlap and variation in view angle and height to achieve a reliable self-calibrating bundle adjustment (Cramer et al., 2017;Przybilla et al., 2015;Wackrow & Chandler, 2008).
Previous research suggests inclusion of oblique imagery to strengthen the network (e.g., Harwin et al.2015;James & Robson, 2014); however, there are certain application scenarios in which it is not feasible. For example, nadir-only designs are applied in bathymetric through-water SfM photogrammetry to minimize light refraction angles at the water surface (e.g., Javernick et al., 2014;Slocum et al., 2020;Woodget et al., 2015). This optical offset can counteract robust bundle adjustment. The use of nadir-only surveys is ideal for orthophoto generation and has a history in classic airborne photogrammetry (Cramer et al., 2017). Time constraints can be another decisive aspect. Adding oblique imagery to surveys can multiply the flight and processing time (Meinen & Robinson, 2020), making it less economically viable for commercial providers. Some RPAS (e.g., fixed wing) feature permanent sensor mounts and may thus be restricted to nadir surveys. For these reasons, single-scale nadir surveys remain the quasi-standard for environmental mapping (Griffiths & Burningham, 2019).
An alternative strategy, if survey or terrain proves too challenging for the self-calibrating bundle adjustment, is to reduce the number of variables by decoupling interior from exterior camera parameters (Cramer et al., 2017). In such a pre-calibrated workflow, the bundle adjustment only solves external parameters while the camera model remains fixed.
Although numerous publications have highlighted its potential, there is a general lack of studies investigating methods and effectiveness of sensor pre-calibration for geomorphological applications (Oniga et al., 2018). Specialized photogrammetric pre-calibration approaches are often not suitable for geomorphological applications, for several reasons. Most critically, the inherent geometric instability of low-cost commercial sensors can invalidate long-term calibration.
Vibrations, temperature and pressure changes potentially affect the sensor geometry and may rule out transport between calibration laboratory and field site (e.g., Cramer et al., 2017;Elias et al., 2020;Sanz-Ablanedo et al., 2020). Therefore, we assume it is critical to perform the calibration on-site, immediately before or after each survey.
In situ sensor calibration often applies portable calibration frames or 2D checkerboards (Griffiths & Burningham, 2019). However, when the calibration is carried out at survey scale (i.e., same distance as survey height) to maintain focus settings, as emphasized by Lichti et al. (2008), such calibration structures may not be suitable for all sensors due to the feature size. Moreover, analogous to selfcalibration on flat surfaces (Sanz-Ablanedo et al., 2020), the geometry of the calibration structure is critical for building a strong image network by reducing the correlation of the camera parameters (Remondino & Fraser, 2006). Samper et al. (2013) andOniga et al. (2018) have demonstrated that 3D calibration structures result in approximately 50% higher accuracies compared to 2D planar calibration fields. Oniga et al. (2018) created a 3D test field on a lawn and a façade that allows calibration at scale. However, the calibration and survey sites were spatially and temporally separated. Harwin et al. (2015) set up a calibration field on-site and included some degree of threedimensionality by placing survey targets on tripods. The downside of portable calibration structures with survey targets is the workload resulting from the required set-up and surveying for every repetition.
In the case of multi-temporal surveys this can be especially time demanding.
Additional challenges arise from the application of multi-sensor systems. Few publications investigate pre-calibration of thermal sensors (Conte et al., 2018) and none of the approaches attempts simultaneous calibration of multi-sensor systems. Using the same structure for visible and thermal sensors would be the most efficient, but their dynamic properties are not necessarily compatible and hence require suitable targets (Conte et al., 2018). When investigating thermal sensors, previous studies have applied active (e.g., light bulbs) (Luhmann et al., 2013) or passive targets (e.g., holes in aluminium plate (Bison et al., 2012) or black velvet and silver heat protection foil (Westfeld et al.2015)), subsets of the survey imagery (Conte et al., 2018), 2D calibration planes (Bison et al.2012;Westfeld et al., 2015) and 3D calibration frames (Eltner et al., 2021;Luhmann et al.2013).
Most of these approaches use short sensor-object distances and calibration structures that cannot be scaled up for a calibration on survey scale. Moreover, the previous approaches do not account for (field-) time-efficiency or cannot be performed on-site. The approach presented by Senn et al. (2020) meets the criteria but has not yet been applied to pre-calibrate a survey.

| Aim and objectives
The overarching research reported in this paper is one such geomorphological case where the requirements of the survey do not readily allow for sufficient self-calibration. We conducted a multi-temporal SfM photogrammetric survey using a multi-sensor RPAS to monitor geomorphic changes induced by artificially added in-channel log jams, designed to help restore habitat for Atlantic salmon (Salmo salar), on the River Gairn in Aberdeenshire within the Cairngorms National Park, Scotland. We derived topographical information from visible imagery and used orthophotos to map the surfaces of the survey site. Supplementary thermal orthophotos provide a valuable addition to aid surface classification.
A pre-calibration method aimed at geomorphological applications was described by Senn et al. (2020) and was designed as an applicable addition that can fit into restricted fieldwork schedules. It overcomes the shortcomings of previous, typically lab-based, calibration workflows, most prominently scale, workload and suitability for different sensors. The pre-calibration approach utilizes distinct features on a building present in the survey site instead of survey targets. The calibration dataset is generated by manual localization of features as 3D coordinates from terrestrial laser scans for reference, and 2D image coordinates for the calibration.
Following up on Senn et al. (2020), this paper reports on the effectiveness of sensor pre-and self-calibration, as well as other error mitigation strategies, on the accuracy of results achieved with SfM photogrammetry in a full topographic survey of wetted and dry areas in a river corridor. To this end, we conducted a systematic experiment in which we used all combinations of calibration strategy (precalibration in vision measurement system (VMS), pre-calibration in Agisoft MetaShape (AM) and self-calibration in AM), flight altitudes and viewing angles, as well as masking out error-prone areas. Furthermore, we evaluate a simplified scan set-up for pre-calibration reference data using a single scan instead of a registered point cloud acquired from multiple different perspectives. In addition, we demonstrate the multi-sensor applicability by creating thermal orthophotos.
The objectives of the study were to: 1. assess the impact of the scan set-up and software choice on precalibration accuracy; 2. compare the performance of sensor pre-and self-calibration and additional error mitigation strategies in a geomorphological survey scenario; 3. evaluate the applicability of the methods with regard to a realworld survey on the River Gairn.
Ultimately, our aim is to make an informed recommendation for sensor calibration in geomorphological research. In contrast to photogrammetry-centred research, our emphasis is to balance photogrammetric accuracy and geomorphological applicability. We appreciate that factors such as streamlined software implementation (e.g., compatible file formats), software availability and time requirement can be critical in deciding whether to implement additional processing steps to the workflow. The calibration method is specifically designed to minimize on-site time requirement, being aware that the cost-benefit consideration ultimately determines whether a method is adopted or not.

| METHODOLOGY AND DATASETS
The methods are decomposed into sensor pre-calibration and application to the real-world scenario of a geomorphological survey ( Figure 1). Visible imagery was captured using the built-in sensor of a The riverbanks are mostly covered by grass or heather and partly undercut. The riverbed and several exposed bank sections and gravel bars along the stream consist of coarse gravel and cobbles with one area of exposed granite bedrock at the bend next to cross-section 2

| Pre-calibration
For the sensor pre-calibration any 3D structure present on-site can be utilized as a calibration structure. This could be either natural (e.g., boulders or rock formations) or artificial (e.g., bridges, buildings or stone walls) stable structures. In our case study the sensor precalibration was carried out using a stone building as the calibration structure ( Figure 2). Imagery from a thermal and an RGB sensor taken with varying perspectives and distances serve as the calibration dataset, and terrestrial laser scans provide the reference dataset. Conjugate features clearly visible in both imagery and point cloud were used as calibration targets. The following paragraphs provide more detail on the preparation of the reference data, preparation of the calibration data and generation of the sensor parameters. We adopted the close-range photogrammetry software VMS (version 8.8) (Shortis & Robson, 2015) as the calibration benchmark and AM (version 1.7.2) (Agisoft LLC, 2021) as a potentially more applicable and widely adopted alternative. VMS is established as calibration software (James et al., 2020;Shortis & Luhmann, 2018) wherein the photogrammetric procedure is transparent, comprehensible and highly customizable. However, the operation requires precise knowledge and can be somewhat cumbersome, especially when utilizing large datasets.

| Reference dataset (3D feature coordinates)
The reference dataset was derived from a point cloud captured using a Leica ScanStation P40 terrestrial laser scanner operating from a sensor-object distance of 20 m with a resolution setting of 3.1 mm, at a distance of 10 m. The raw scans were processed using Leica Cyclone (version 9.2.1). The point cloud was not georeferenced and the calibration was performed using a local coordinate system. This eliminates building outlines, corners and window openings, which are often not clearly recognizable in the thermal imagery. We found individual stones to be more clearly recognizable, albeit more challenging to precisely locate in the terrestrial laser scanner point cloud.

| Calibration dataset (2D feature coordinates)
The sensors were used to capture a total of 158 thermal and 101 visible images flying in circular patterns triggering at a set interval of 3 s.
The distances between features and sensors range between 8.2 and 32.4 m for the RGB sensor, and between 20.8 and 37.5 m for the thermal sensor. To ensure the reliability of the calibration by eliminating outliers, a sufficient level of redundancy in the dataset (i.e., number of images and observations) is necessary (Shortis, 2019).
Adding more images does not indefinitely improve the calibration but leads to increasing computational time (Eltner et al., 2016) and manual work. All images were imported and aligned in AM to allow for selection of suitable calibration subsets. An ideal calibration dataset features a convergent network with a variety of distances and perspectives, sufficient overlap, and covers the entire sensor area (e.g., Eltner & Sofia, 2020;Kenefick, 1972;Oniga et al., 2018;Sanz-Ablanedo et al., 2020;Shortis, 2019). This minimizes the parameter correlations and ensures that the calibration accurately represents the physical model (Kenefick, 1972;Shortis, 2019 Table 1) for subsets of 16 images using the multi-scan set-up, and Hieronymus (2012) state that a subset of 8-12 images can be sufficient. Given the increased overlap, and thus redundancy, in the single-scan set-up we concluded that a subset of 16 images provided the ideal dataset size. The feature observations were created by placing markers in AM and exported using a script for the built-in Python console. We have deliberately avoided coded targets that could be automatically recognized. Using natural features instead allows the survey to be repeated based on the same terrestrial laser scan (and feature coordinates) without having to repeat the target placement and surveying. Moreover, a higher accuracy was achieved by manually placing markers, rather than using the AM automatic Refine Markers tool.

| Determining calibration parameters
The reference data were split into independent check and control points for validation. The exported 3D reference feature coordinates and the 2D observation coordinates were used to determine the benchmark pre-calibration in VMS by solving the collinearity equation (Brown, 1971) as described in Senn et al. (2020). In parallel, we performed a second pre-calibration in AM by importing the 3D reference feature coordinates to generate camera parameters by self-calibrating bundle adjustment ( Figure 1). As suggested by Shortis and Luhmann (2018), the parameters were initially fixed and subsequently released iteratively, beginning with the radial distortion parameters (k 1 , k 2 and k 3 ) and the principal point offsets (x p and y p ) to the tangential distortion (p 1 , p 2 ) to the affinity and orthogonality terms (b 1 , b 2 ). The AM version used does not allow a marker-based calibration; hence the photogrammetric tie points could not be excluded from the calibration, unlike the procedure presented by Senn et al. (2020). We believe a strictly marker-based calibration would facilitate the exclusion of unwanted image-based optimization in the AM software 'black-box' and at the same time allow a better comparability with VMS (Harwin et al.2015).
We selected suitable values for image accuracy based on  and Harwin et al. (2015): 0.5 pixels in both software packages and 1 pixel for the tie points in AM. Marker accuracy was set to 3 mm according to the observation provided by the terrestrial laser scans.

| Real-world application: river survey
Having successfully pre-calibrated the sensors, the subsequent step was to transfer the camera parameters to the river survey dataset.
The focus of our application was on DEM generation using visible imagery. Thermal imagery, on the other hand, was utilized for the creation of orthophotos to be applied for water surface detection. The survey was conducted on a 1 km reach of the upper River Gairn, covering a total area of 0.7 km 2 . The area was split into a west and an east section, with separate take-off and landing sites to avoid exceeding the legal flying distances. The optical dataset consists of a total of 922 images from three different flying heights and view angles: • 30 m nadir, 504 images (overlap: forward 60%, lateral 60%); • 40 m oblique, 256 images (forward 50%, lateral 40%); • 90 m nadir, 162 images (forward 80%, lateral 60%). The raw data were post-processed using Leica Infinity (version 2.4.1). In the first step, the base stations were processed with a T A B L E 1 Overview of the datasets and resulting metrics of the sensor pre-calibration (multi-scan set-up results from Senn et al. 2020)  baseline from the local OSNet station in the nearby town of Braemar (BRAE), which is located 13 km away. The software has achieved a sub-millimetre accuracy for the base stations in post-processing. Subsequently, the raw data acquired by the rovers were processed with a baseline from the fixed coordinates of the local base stations. The high measurement accuracy at the base stations ensured a correspondingly high relative accuracy at the reference points (1 mm) and GCP (0.1 mm) measurements, based on metric 'CQ 3D' provided in the Leica Infinity output. However, this metric appears to be overly optimistic. Typically, the accuracy is estimated as an average over repeated measurements at the same location, which is not always feasible due to time constraints in the field. The reference data were exported from Leica Infinity and imported into R (R Core Team, 2020) for further filtering and quality assessment. Due to a faulty battery not all GNSS reference points could be successfully post-processed; therefore, the spatial distribution is not ideal.
To isolate the effect of different levels of error mitigation, the visible imagery and the GCPs were processed in AM following 12 different predefined cases, as shown in Table 2.
Cases B1, B2 and B3 use water masks during the bundle adjustment to remove the effect of light refraction at the water surface. For consistency between cases, we generated a mask for every individual image beforehand. The masks can be generated efficiently from a mesh and do not require centimetre accuracy. For this, a selfcalibrating bundle adjustment was run on the full visible dataset  Table A1) and plotted as distortion profiles alongside the precalibrated results (see Appendix Figure A1).

| Thermal imaging
We implemented an additional self-calibrating bundle adjustment using the thermal sensor to complement the experiment and as a proof of concept. The 'best-case scenario' with imagery from all three heights and angles was used, and no water masks were applied. The retrieved calibration parameters were exported (Table A. Thermal imagery is not highly suitable for DEM creation due to its low resolution and dynamic range, but it has significant potential in orthophoto creation (Maes et al., 2017). We demonstrated the orthophoto generation workflow based on the DEM adopted from the visible dataset (case E3) in AM.
Analogously to the workflow of the visible dataset, we imported the 90 m nadir thermal imagery and fixed the camera parameters to the VMS pre-calibration parameters. The images were then aligned provisionally to aid marker placements on the GCP observations. Subsequently, the images were finally aligned using the highest accuracy setting. At this point the method deviates from the visible: Instead of building the dense cloud and deriving the DEM from thermal imagery, we imported the previously created DEM from visible imagery. Finally, we created the orthophotos using Build Orthomosaic.

| Pre-calibration results
pre VMS x corresponding distortion profiles for radial and tangential distortions can be seen in Figure A1 (Appendix). The pre-calibrated profiles of the RGB sensor followed a similar course for both software packages. In the case of the thermal sensor, less uniform profiles were calculated.
The self-calibrated RGB distortion profiles from processing the different cases of the survey dataset form a highly similar group and run parallel to the AM pre-calibration. The self-calibrated thermal camera parameters show a profile more similar to the AM pre-calibration.

| DEM error analysis of the river survey dataset
The different processing cases (Table 2) were applied in AM to produce a series of DEMs. The Z-values were extracted from the DEMs using the xy-coordinates of the GCPs and GNSS reference points to assess the height offsets using R (R Core Team, 2020). Figure 4 shows boxplots of the offsets between the GCPs and the DEMs. Overall, the smallest GCP offsets were calculated using the VMS pre-calibration, the AM pre-calibration yielded the largest errors (lower for E1) and the self-calibration is in between the two (with a trend towards higher errors with less error mitigation). The largest errors were found for the D cases that use nadir imagery from two flying heights. Only for the single-scale nadir case E did the AM precalibration result in smaller errors at the GCPs.
The reference point classes GRV and ROA are the least susceptible to distortion or noise and are therefore used to evaluate DEM quality ( Figure 5). This is reflected in the relatively small offset values and variation of these two dry surface classes compared to the other classes ( Figure A4 (Table 3 and, in Appendix, Figure A2). This trend is not only visible in the RMSE values but also in the deviation of errors, as indicated by the larger inner quartile ranges in the boxplots ( Figure 5) and the standard deviations (Appendix Table A2 and Figure   Based on the literature, it is to be expected that insufficient sensor calibration leads to systematic errors that result in characteristic spatial error patterns (e.g., James et al., 2020;Sanz-Ablanedo et al., 2020). To detect such DEM distortions, we plotted the z-offsets between GNSS reference points and DEMs of the relevant dry surface classes (GRV and ROA) on a map (Figure 6), as well as between the GCPs and the DEMs (Appendix Figure A5). Red shades indicate DEM elevations higher than the corresponding GNSS measurements, and blue shades correspondingly show lower values. A number of reference points could not be post-processed due to a faulty base station battery; hence the spatial coverage of the reference points is not ideal. Nevertheless, there are signs of tilting or dome-shaped distortion indicating systematic errors. These errors are particularly pronounced in the self-calibrated nadir-only cases D2 and E2 (Figure 6).
To a certain extent, spatial error patterns are also visible in the AM pre-calibrated multi-scale cases B1, C1 and D1. The distribution of errors in the cases that showed larger errors in the boxplots is not random but displays systematic spatial patterns.
Because of its low errors we selected case E3 as the reference case for further analysis and applied it to calculate pairwise DoDs with all other cases (Figure 7). In addition, we extracted and plotted a set of cross-sections to visualize DEM distortions (Figure 8).
A clear divide between pre-and self-calibration is evident for all case sets.
The similarities between cases are inherently greater within a calibration strategy and, consequently, using E3 as reference case creates a certain bias. The

| Application of the thermal dataset
We successfully applied the thermal dataset in a pre-and selfcalibrated bundle adjustment (self-calibrated parameters in Appendix Table A1 and distortion profiles in Appendix Figure A1). To demonstrate a potential usage scenario, we mapped the thermal imagery of the 90 m nadir dataset onto the visible DEM (case E3). The created thermal orthophoto is shown in Figure 9. The RMSE at the control points was 1.3 pixels. for manual target measurements (Fraser, 2018;Geomsoft, 2008;Shortis, 2015;Shortis et al., 1995  . The noticeably higher RMSE values of AM compared to the VMS benchmark, despite using the same parameters, illustrates the importance of software choice, an effect likely to be related to the tie points. Ideally, all tie points would be removed prior to determining the calibration parameters in AM to exclude the influence of its 'black-box' image-matching algorithms (Harwin et al., 2015). However, the applied software version does not allow for a marker-based calibration. To test whether the tie point accuracy could be exploited to decrease the weighting of the tie points in relation to the markers, we ran the AM pre-calibration with different settings (0.4, 1 and 6, based on , and 100). We found no significant effect on the RMSE, and therefore kept the initial value equal to unity. Future research should explore the potential of tie point masking, filtering or marker-to-tie-point conversion to optimize the pre-calibration capabilities with AM, as we emphasize great potential in a single-software solution.
Calibration quality is furthermore reflected in the highly similar radial and tangential distortion profiles of the RGB sensor in both precalibration scenarios. However, the distortion profiles of the thermal sensor deviate significantly. The principal point offsets x p and y p (see Table A1) are nearly identical between the two software packages for visible, while they are different in the thermal sensor. Both RGB and thermal sensors show high correlations between the tangential distortion parameters and the principal point offsets (thermal: 0.91 p 1 -x p , 0.86 p 2 -y p ; and visible: 0.87, 0.72).
The typically high correlation of these parameters (Shortis, 2019) can indicate over-parametrization, and James, Robson, and Smith (2017)

| Calibration geometry and scan set-up
A central aim of this study was to test how the simplification of the scan set-up influences the quality of the sensor pre-calibration. We found significantly lower RMSE values using the single-scan set-up compared to the multi-scan set-up reported in Senn et al. (2020). We believe that several factors have played a role in this improvement in accuracy. Most importantly, using a single point cloud omits errors in point cloud registration. The original purpose of the multi-scan set-up was to encircle the calibration structure and thus to avoid bending and incorrect angles in the network. Previous researchers suggest the application of a 3D structure rather than a 2D calibration plane in order to create a stable geometric network (Harwin et al., 2015;Oniga et al., 2018). Our results now suggest that the single-scan of two façades provides sufficient 3D structure for a robust calibration. The major advantage of a single scan is that less façade area needs to be covered. Higher overlaps and variation of perspective and scale can be achieved with the same number of images. Consequently, it is easier to include odd angles and varying scales to optimize the convergent image network. The improved geometry has reduced the risk of parameter correlation and outliers can be eliminated more efficiently due to the higher redundancy (Shortis, 2019). At the same time, fewer target features need to be defined and manually digitized. Ultimately, processing time and manual work are limiting factors that have to be balanced with redundancy.
The single-scan set-up has some improvements that apply especially to thermal sensors. It allows exclusion of north-facing walls that are never exposed to direct solar irradiation in Scotland. This results in a low dynamic range for the imagery (a known issue of thermal sensors in SfM applications (Maes et al., 2017)) and thus makes the recognition of target features more difficult. Images acquired under direct solar irradiation provide better contrast and features can be digitized more accurately.
Changing the scan set-up also required defining and extracting a new reference dataset. We updated the conventions of feature selection based on the lessons learned in the previous approach, where the selection was mainly based on visible and terrestrial laser scanning.
We found that building outlines tend to be fuzzy in the thermal imagery. Some stones that are clearly visible in thermal and terrestrial laser scanning cannot be distinguished from the surrounding mortar in the visible imagery. Overall, the selection convention evolved from corners and edges towards bricks while carefully assessing the visibility in all sensors.
The single-scan set-up not only improves the calibration accuracy but also the required workload. It cuts the time requirements for scans in the field, point cloud post-processing and registration and number of target features to manually digitize. Reducing the complexity increases the applicability and thus the potential applications of the approach.
The importance of calibrating at survey scale (i.e., sensor-object distance similar to flying height) has been emphasized previously (Griffiths & Burningham, 2019;Lichti et al., 2008;Roncella & Forlani, 2021). The sensor-object distances applied are similar to the nadir 30 m flying height. The scales of the higher flying heights are not represented in the calibration. Since the depth of field increases exponentially with increasing object distance, we assume that transferring calibration parameters generated from 30 m is more suitable than conventional pre-calibration routines using checkerboards or portable frames at short distances. Too small calibration objects would either not sufficiently cover the sensor area (Shortis, 2019)

| Application in the geomorphological survey
The dry reference point classes GRV and ROA were not evenly distributed across the survey area, due to gaps in the base station data.
However, we assume that the clustered distribution, with accumulations of points close to the boundaries (ROA) and the centre (GRV), is sensitive to systematic DEM distortions and thus suitable for the purpose. However, the reference points do not reflect whether the systematic error pattern shows doming or tilting deformation. This becomes clearer in our second validation dataset; the DoDs and the derived cross-sections provide better representation of spatial patterns and thus systematic errors in the camera calibration. However, the DoDs have to be assessed carefully, as they represent DEM offsets relative to a reference case (E3) and can thus be biased. Using two independent validation datasets compensates for respective weaknesses and strengthens results that point in the same direction.
The systematic design of our experiment allows us to isolate the effects of the individual mitigation measures.

| Water masks
The first element (only case B) of the error mitigation strategy was the water masks. However, in our scenario we did not find any significant effect of their application. None of the calibration scenarios display significant differences between cases B and C. The rationale was to mask out water bodies that are assumed to be particularly prone to  et al., 2015) or snow cover in thermal imagery (e.g., Webster et al., 2018). Where the addition proves beneficial our approach can be particularly valuable because the implemented workflow allows fast and efficient generation of a global mask dataset.

| Oblique imagery
The second element of the error mitigation strategy was to include oblique images. The results of the scenarios using self-calibrated bundle adjustment show the biggest difference between the cases with (B2 and C2) and without (D2 and E2) oblique imagery. Relatively low RMSE values and spatial error pattern in the DoDs indicate that the best accuracy was achieved with inclusion of oblique imagery. The decline in model quality is also reflected in the increasing standard deviations by factors 4 and 8 at the dry GNSS reference points in the self-calibrated nadir cases D2 and E2. A convergent image network proves to be the necessary requirement for solving the self-calibrating bundle adjustment (James & Robson, 2014). These findings agree with previous studies using simulation (James & Robson, 2014;James, Robson, & Smith, 2017) and applied (Nesbit & Hugenholtz, 2019) datasets that include oblique view angles. The orthophoto (Figure 9) shows radiometric irregularities related to changing solar irradiation conditions during the data collections.
Future research should investigate the effect of more consistent lighting conditions and radiometric corrections. We believe that the application in this study can serve as a proof-of-concept application for the use of thermal sensors, but its success also suggests that the approach can be applied to other sensors-for example, multispectral and hyperspectral (e.g., Lucieer et al., 2012;Maes et al., 2017 (James & Robson, 2014).
Potential errors in the pre-calibration or direct georeferencing propagate into the final model when interior and exterior parameters are fixed in the bundle adjustment (Cramer et al., 2000). Without exterior constraint the bundle adjustment can compensate for erroneous precalibration by shifting camera positions (Cramer et al., 2000;Eltner & Sofia, 2020). The low errors of the single-scale pre-calibrated cases may be an example where fewer constraints (network geometry and exterior parameters) lead to better results.
An alternative could be adaptive camera calibration that solves a highly constrained bundle adjustment and subsequently removes the constraints to allow the interior parameters to readjust (Zhou et al., 2019). Further, experiments comparing calibration validity over short (landing and take-off) and long (site revisit) periods would be beneficial.

| CONCLUSIONS
We have demonstrated an efficient workflow for RPAS-based multisensor on-site pre-calibration in geomorphological research. For DEM generation from visible imagery we found the largest potential of precalibration in the application of single-scale nadir-only surveys. This type of survey design is particularly common in geomorphological applications and can lead to systematic errors if not handled correctly.
Such a dataset (30 m nadir-only) resulted in the largest vertical offsets when applied in a self-calibrated bundle adjustment. When applied using the VMS pre-calibrated camera parameters, however, it resulted in the smallest errors overall. With regard to the application in geomorphological surveys, pre-calibrated nadir-only single-scale designs can be more efficient in terms of time requirement or area covered. T A B L E A 2 Standard deviations of the z-offsets between DEM and the GNSS reference points. See Figure A3 for visualization as bar chart