ReCXTUnet: A Novel Framework for Remote Sensing Image Classification using TransUnet and XGBoost
DOI:
https://doi.org/10.58825/jog.2025.19.2.242Keywords:
Remote Sensing Image classification, Deep learning, TransUnet, ResUnet, SwinUnet, XGBoostAbstract
Remote sensing image classification (RSIC) is crucial for many environmental and urban applications. RSIC can be difficult due to the high variability and dimensionality in remote sensing image data. This paper presents a novel framework that combines Transformer-based U-Net (TransUNet) and eXtreme Gradient Boosting (XGBoost) for RSIC. TransUNet, known for its powerful feature extraction capabilities, efficiently captures contextual and spatial information from remote sensing images. Additionally, XGBoost improves classification accuracy by efficiently managing high-dimensional data. TransUNet was originally designed for image segmentation tasks, instead of classification. Its architecture is designed to excel at segmenting complex details within images. In our proposed framework, we have adapted TransUNet by adding a classification layer. The fully connected layer of TransUNet serves as the base learner for XGBoost, forming a robust framework for efficient RSIC. This hybrid approach, which combines TransUNet and XGBoost, offers multiple benefits. TransUNet maintains complex details and spatial relationships in images, which improves feature representation. XGBoost provides high predictive accuracy and prevents overfitting with the help of gradient boosting algorithm. This combination tackles challenges in RSIC, such as variations in image quality and noise. We evaluated the proposed approach using high-resolution remote sensing images from the RSI-CB 256 and NWPU-RESISC45 datasets. Our findings show that our framework has outperformed other existing baseline models, attaining an impressive classification accuracy of 91% in RSIC. The experimental results indicate that our approach not only enhances classification accuracy but also remains robust against variations in image quality and noise.
References
Badrinarayanan, V. and A. Kendall (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.
Cao, H., Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian and M. Wang (2022). Swin-Unet: Unet-like pure transformer for medical image segmentation. European Conference on Computer Vision, 205–218. Springer.
Chen, F. and J. Y. Tsou (2021). DRSNet: Novel architecture for small patch and low-resolution remote sensing image scene classification. International Journal of Applied Earth Observation and Geoinformation, 104, 102577.
Chen, J., Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille and Y. Zhou (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306.
Chen, P., G. Kok Lim, K. I. Man, M. Khairuddin and Y. L. Chen (2017). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Chen, T. and C. Guestrin (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794.
Cheng, G., J. Han and X. Lu (2017). Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10), 1865–1883.
Diakogiannis, F. I., F. Waldner, P. Caccetta and C. Wu (2020). ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing, 162, pp. 94–114.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), pp. 861–874.
Gharbia, R. (2023). Deep learning for automatic extraction of water bodies using satellite imagery. Journal of the Indian Society of Remote Sensing, 51, 1511–1521.
Gualtieri, J. A. and R. F. Cromp (1999). Support vector machines for hyperspectral remote sensing classification. 27th AIPR Workshop: Advances in Computer-Assisted Recognition, 3584, pp. 221–232. SPIE.
Kearns, M. and L. Valiant (1994). Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41(1), pp. 67–95.
Li, H., X. Dou, C. Tao, Z. Hou, J. Chen, J. Peng, M. Deng and L. Zhao (2017). RSICB: A large-scale remote sensing image classification benchmark via crowdsource data. arXiv preprint arXiv:1705.10450.
Li, Y., D. Kong, Y. Zhang, R. Chen and J. Chen (2021). Representation learning of remote sensing knowledge graph for zero-shot remote sensing image scene classification. IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, pp. 1351–1354.
Miao, F., K. Sun, H. Song, S. Xu, Y. Ma and Z. Miao (2018). Automatic water-body segmentation from high-resolution satellite images via deep networks. IEEE Geoscience and Remote Sensing Letters, 15(4), pp. 602–606.
Ren, X., H. Guo, S. Li, S. Wang and J. Li (2017). A novel image classification method with CNN-XGBoost model. in Digital Forensics and Watermarking: 16th International Workshop (IWDW 2017), Magdeburg, Germany, 23–25 August 2017, Proceedings 16, pp. 378–390. Springer.
Ronneberger, O., P. Fischer and T. Brox (2015). U-Net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham. pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28,
Shaheed, K., Q. Abbas, A. Hussain and I. Qureshi (2023). Optimized Xception learning model and XGBoost classifier for detection of multiclass chest disease from X-ray images. Diagnostics, 13(15), 2583.
Tang, Y., Z. Wang, Y. Jiang, T. Zhang and W. Yang (2023). An auto-detection and classification algorithm for identification of sand dunes based on remote sensing images. International Journal of Applied Earth Observation and Geoinformation, 125, 103592.
Vapnik, V. (1999). The Nature of Statistical Learning Theory. Springer, New York.
Wang, X., H. Xu, L. Yuan, W. Dai and X. Wen (2022). A remote-sensing scene-image classification method based on deep multiple-instance learning with a residual dense attention convent. Remote Sensing, 14(20), 5095.
Yang, Y. and S. Newsam (2010). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems.
Yang, Y. and S. Newsam (2008). Comparing SIFT descriptors and Gabor texture features for classification of remote sensed imagery. 15th IEEE International Conference on Image Processing, 15(7), pp. 1852–1855.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Geomatics

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
