Suspicious human behaviors can be defined by the user, and in long distance imaging it may include bending the body during walking or crawling, in contrast to regular walking for instance. State-of-the-art methods using convolutional neural networks (CNNs) dealt in general with "clean" signals, in which the object of interest is relatively close to the camera, and therefore fairly clear and easily distinguished from the surrounding environment. This makes it easier to capture detailed information regarding the object and its action. However, in relatively long distance imaging (few kilometers and above) additional difficulties occur which affect the performances of these tasks, since the captured videos are likely to be degraded by the atmospheric path that cause blur and spatiotemporal-varying distortions. Both of these degradation types may reduce the ability for action recognition. These effects become more significant for longer imaging distances and smaller sizes of the objects of interest in the image. The images of objects in imaging through long distance are usually relatively small, and hence, the range of actions that can be resolved is more limited, particularly under strong atmospheric effects. In this study, we perform action localization by first applying optical flow unique processing, and also using a variant of SSD (Single Shot MultiBox Detector) to regress and classify detection boxes in each video frame potentially containing an action of interest.