Abstract
This letter provides insights on the effectiveness of the zero-shot, prompt-based Segment Anything Model (SAM) and its updated versions, SAM 2 and SAM 2.1, along with the nonpromptable conventional neural network (CNN), for segmenting solar panels in RGB aerial imagery. The study evaluates these models across diverse lighting conditions, spatial resolutions, and prompt strategies. SAM 2 showed slight improvements over SAM, while SAM 2.1 demonstrated notable improvements, particularly in suboptimal lighting and low-resolution conditions. SAMs, when prompted by user-defined boxes, outperformed CNN in all scenarios; in particular, user-box prompts were found crucial for achieving reasonable performance in low-resolution data. In addition, under high resolution, YOLOv9 automatic prompting outperformed user-points prompting by providing reliable prompts to SAM. Under low resolution, SAM 2.1 prompted by user points showed similar performance to SAM 2.1 prompted by YOLOv9, highlighting its zero-shot improvements with a single click. In high resolution with optimal lighting imagery, Eff-UNet outperformed SAMs prompted by YOLOv9, while under suboptimal lighting conditions, Eff-UNet, and SAM 2.1 prompted by YOLOv9, had similar performance. However, SAM is more resource-intensive, and despite improved inference time of SAM 2.1, Eff-UNet is more suitable for automatic segmentation in high-resolution data. This research details strengths and limitations of each model and outlines the robustness of user-prompted image segmentation models.
| Original language | American English |
|---|---|
| Article number | 6002505 |
| Journal | IEEE Geoscience and Remote Sensing Letters |
| Volume | 22 |
| DOIs | |
| State | Published - 1 Jan 2025 |
Keywords
- Remote sensing
- SAM 2.1
- Segment Anything Model (SAM) 2
- solar panels
- transfer learning
- you only look once (YOLO)
All Science Journal Classification (ASJC) codes
- Geotechnical Engineering and Engineering Geology
- Electrical and Electronic Engineering