I want to talk about one problem that has been bugging me as a motorcyclist for a while. I usually ride with a helmet cam with me. For example, I’ve been to Japan for some motorcycle road trips before. I’ve collected hours and hours of videos of the roads ahead and some other different angles. However, it is really hard to find a highlight in the video.

Sometimes you noticed something interesting going on the road. Taking one example from my recent trip to Napa, I saw two squirrels fighting on the road as I rode by. (okay, it is both interesting and scary at the same time, luckily I managed to miss them.) How do I recover these highlights from a boring long video? The problem is that roads looks very similar and it is very easy to miss the exact moment you see something when you are skimming through the video.

I first thought about GPS might work if I can just remember where it happens and it turns out it’s really hard to remember at which corner you see fun stuff and even if you do, synchronizing the video with recorded GPS tracks is usually a long process even if your helmet cam records GPS track at the same time as well. I thought about making a hardware button that just records a timestamp but then I will first need to figure out the right hardware to make one then to mount it on the bike and to synchronize it with the video too.

Finally I had a really simple idea. What if I just use my hand to cover the camera? It’s simple, easy to do and now all I need to figure out is how to detect black frames from the video.

Here is one of the example of how a “marker” would look like on video when you use your hand to just cover the camera for a second. As long as you are covering the lens, it should produce a very dark frame comparing to regular day time riding videos.

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
_, dist = cv2.threshold(gray, 30, 255, cv2.THRESH_BINARY)
dark_pixels = np.count_nonzero(dist == 0)
dark_percent = (float(dark_pixels) / size * 100)

We first convert the frame to grayscale for easier processing since all we card are detecting black pixels anyway. Then we run the frame through a threshold filter to mark anything below gray level 30 to 0 (perfect black) and anything else to 255 (perfect white) then we count the pixels having value equals to zero.

A grayscale threshold-processed frame

Now we take this snippet and apply a bit more logic: let’s say we will count a frame as a marker if more than 95% of its pixels are black. We might also have multiple marker frames when your hand is moving in and out of the view so we will want to merge close-by marker points, let’s say we will only have 1 marker per 5 seconds. Now we can write out the final code!

import sys
 
import math
from datetime import datetime
import numpy as np
import cv2
 
MERGE_THRESHOLD_MS = 5000
 
 
def format_time(timestamp):
    msec = timestamp % 1000
    parts = [msec]
 
    secs = math.floor(timestamp / 1000)
    parts.append(secs % 60)
 
    mins = math.floor(secs / 60)
    parts.append(mins % 60)
 
    hrs = math.floor(mins / 60)
    parts.append(hrs)
 
    parts.reverse()
    return "%02d:%02d:%02d.%03d" % tuple(parts)
 
 
def main():
    src = cv2.VideoCapture(sys.argv[1])
    if not src.isOpened():
        print("Error opening file")
        sys.exit(0)
    length = int(src.get(cv2.CAP_PROP_FRAME_COUNT))
    width = src.get(cv2.CAP_PROP_FRAME_WIDTH)
    height = src.get(cv2.CAP_PROP_FRAME_HEIGHT)
    size = width * height
    markers = []
    start_time = datetime.now()
 
    while src.isOpened():
        ret, frame = src.read()
        if not ret:
            break
        idx = int(src.get(cv2.CAP_PROP_POS_FRAMES))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        _, dist = cv2.threshold(gray, 30, 255, cv2.THRESH_BINARY)
        dark_pixels = np.count_nonzero(dist == 0)
        dark_percent = (float(dark_pixels) / size * 100)
        frame_time = int(src.get(cv2.CAP_PROP_POS_MSEC))
        fps = idx / (datetime.now() - start_time).total_seconds()
        print("\033[0KFrame %d/%d [%s]: %.2f fps, %.2f%% black. %d black frames found.\r" %
              (idx, length, format_time(frame_time), fps, dark_percent, len(markers)),
              end='')
        if dark_percent > 95:
            markers.append(frame_time)
 
    merged_markers = []
    for marker in markers:
        if not merged_markers or marker - merged_markers[-1] > MERGE_THRESHOLD_MS:
            merged_markers.append(marker)
 
    print()
    print("Markers:")
    for marker in merged_markers:
        print("  %s" % format_time(marker))
 
    src.release()
 
 
main()

To actually run this script, you will need to have opencv-python and numpy installed.

One thing I have not figured out on how to improve is the performance of the script. It currently takes about 5 mins to process this 26 mins long video. It looks like most of the processing is done on CPU (decoding/analyzing). I’m wondering if try to move some processing into GPU would help with the speed but that’s another topic for another time!

And this is the story of how I recover that squirrel snippet from a 4 hours long recording!