What's happening: The default Whisper models often produce incorrect output when given short input segments with no speech. Joplin includes filtering based on heuristics to avoid sending silent input to Whisper. This filtering may improve in future versions of Joplin (see this commit).
Possible workarounds:
Try a different model: The whisper-small-q8_0.fr.bin model was fine-tuned on French language data and a small amount of auto-generated noise. This noise may make the model less likely to produce output when given silent input. Based on testing done here, whisper-small-q8_0.fr.bin should still support English. This model may perform better with mostly silent/empty input. However it may not.
Modify the whisper-small-q8_0 config: It's possible to add text replacements to Joplin voice typing models. Modifying the configuration file to remove "It's a note-taking application." is a possible workaround.
While the small fine-tune worked better, I soon opted for the tiniest Whisper model for time reasons. Prompts (or parts of them) still appear enough to make me question why they are implemented no matter which Whisper model is used.
What should I expect from removing or blanking all prompts in the config.json file? If the actual resultant output is more favorable than the current defaults, should the file be updated?