3D Pose Estimation of Football Players

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
Författare
Osterman, Joakim
Sjögren, Olof
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Abstract In the context of football analytics, video recordings of matches play a crucial role in post-game analysis. However, videos are inherently limited because they only allow viewers to follow the match from the camera’s perspective. This thesis is part of a larger project aimed at creating 3D representations of football matches from video, thus enabling users to view the game from anywhere inside the virtual 3D environment. The larger project consists of three parts. This thesis focuses on estimating the camera parameters, as well as the 3D poses and locations of the players in the video. The other two projects focus on player tracking and player texture generation. A pipeline consisting of camera calibration and pose estimation is proposed, taking video recordings and bounding box annotations as input and predicting camera pa rameters as well as the players’ 3D poses and locations. For camera calibration, a model specifically tailored for cameras viewing football fields is used. The results indicate accurately predicted positions and viewing angles for the estimated camera. Pose estimation is performed using a pre-trained model and results in visually ac curate projections, although perspective ambiguities are present when the 3D poses are viewed from different angles. The main approach for positioning players was to detect when players touched the ground and interpolate the positions for ambigu ous frames. The results are promising, but noise in the depth estimations occurs due to perspective ambiguities. Subsequently, an optional optimization of poses and positions using multi-view triangulation is also presented, showing possibilities for further refinement to ensure realistic and consistent human poses. Future work on pose and location optimization could yield a pseudo-truth dataset for further enhancements to improve overall poses and positions from strictly monocular video.
Beskrivning
Ämne/nyckelord
Keywords: 3D Human Pose Estimation, Pose estimation, visual transformers, deep machine learning, camera calibration, depth estimation, multi-view optimization.
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index