Detecting boundary errors with spatial random forests
University of New Brunswick
When observing data at different geographical locations, some categorical variables which defined region membership might be encountered. However, regional boundaries might be hard to correctly define. When using models for responses which are dependent on spatial location and region membership, the mis-specification of region boundaries might have a serious impact on model performance. Diagnostics for detecting issues with mis-specified regional boundaries would be useful in this case. It might be useful to try to amplify these issues by simulating extra observations by some methods that did not rely on a model, but which could choose to make estimates based on some (but not necessarily all) local information. Spatial random forests might be able to do this and we expected they could help us to identify boundary errors. We tried them on simulated data which contained a wrong version of boundaries and a true version of boundaries. The diagnostic performance was investigated by looking for unusual patterns in simulated results of interpolations of those extra unobserved locations. Unfortunately, this approach based on spatial random forests failed to detect boundaries issues.