Day 7 was a "successful failure"

Day 7 was a “successful failure”. I’ve got a lot of videos of my team dragonboating. A key element for a dragonboat team to be successful is paddling in sync - everybody’s paddle should enter and exit the water at the same time. For quite a while I’ve been wondering how hard it would be to highlight the angles of the paddle in the video, so that it was easy to see how in sync the paddlers are. Is this a “5 min” problem once you have the right tools - or a “5 years + a team of engineers” kind of problem?

Well, it turns out its not a 5 min one. I used cline and let it run wild with python in a directory with a few sample videos. It confidently applied all kinds of algorithms to analyse the videos and identify the paddles - all of which failed very, very badly!

About the closest we got was using the YOLOv8 model to identify bounding boxes around the paddlers, and then use the bounding box as a filter for a Hough Line transform to identify the line of the paddle shaft - which kinda, sorta worked sometimes.

Tools I used

  • Cline, with the claude 4.0 sonnet model
  • Python for scripting

What went well

  • I now know that the problem was trying to solve is a very hard one, which will require dedicated effort. There are no “obvious” out of the box approaches that yield quick results.
  • The ability to include images in the cline prompt was helpful - often it would claim that an approach was working, when it obviously wasn’t. Showing cline the visual output of the analysis tools it was building helped keep it on track

What went badly

  • Lots of running around in circles and not much progress - which is probably intrinsic to the problem I was trying to solve
  • The tool was very confident about the success of the approaches it was taking. It would frequently claim that the code it had written was successfully identifying paddles in the video footage, when it was doing no such thing!
  • Writing a whole mess of scripts probably isn’t the best approach to this kind of problem, and I’m pretty sure its not how actual data scientists work. Being able to “vibe code” with a jypyter notebook would probably be a better approach.

What I learnt

  • You can’t easily highlight paddles in footage people paddling a dragonboat
  • The tools are’t as good for doing this kind of exploratory analysis, as they are for doing more typical programming tasks

What I wanted - a picture of a team dragon boat paddling.  Their paddles are successfully highlighted in green.  This is because I did the markup of this image by hand.

What I got - a picutre of a dragon boat team paddling, with an absoulte mess of false positive detections for paddles.  Most of which are tree branches in the background.

The closest I got to something working properly. Its a picture of a dragon boating team - each of the paddlers is highlighted with a bounding box generated by the YOLOv8 computer vision model.  Within the bounding box, a few paddles have actually been successfully identified.

Originally posted on Mastodon