Abstract
: With the increasing use of online meetings, there is a growing need for efficient tools that can automatically generate meeting minutes from recorded sessions. Current solutions often rely on proprietary systems, limiting adaptability and flexibility. This paper investigates whether various open-source models and methods such as audio-to-text conversion, summarization, keyword extraction, and optical character recognition (OCR) can be integrated to create a meeting minutes generation tool for recorded video presentations. For this purpose, a series of evaluations are conducted to identify suitable models. Then, the models are integrated into a system that is modular yet accurate. The utilization of an open-source approach ensures that the tool remains accessible and adaptable to the latest innovations, thereby ensuring continuous improvement over time. Furthermore, this approach also benefits organizations and individuals by providing a cost-effective and flexible alternative. This work contributes to creating a modular and easily extensible open-source framework that integrates several advanced technologies and future new models into a cohesive system. The system was evaluated on ten videos created under controlled conditions, which may not fully represent typical online presentation recordings. It showed strong performance in audio-to-text conversion with a low word-error rate. Summarization and keyword extraction were functional but showed room for improvement in terms of precision and relevance, as gathered from the users’ feedback. These results confirm the system’s effectiveness and efficiency in generating usable meeting minutes from recorded presentation videos, with room for improvement in future works.