GSoC 2026: Opportunities for the AI projects

This discussion aligns very closely with what I’ve been thinking while working on my proposal.

  • For the past 4 days I've been planning a proposal combining the ideas 1,3 and 4 because after thoroughly researching about these ideas, it came to my attention that they fall under the same single umbrella that is they share the same retrieval foundation.

  • I kept coming back to the fact that all 3 required building the embedding pipeline first then letting the three features consume it.

  • This approach aligns with the removal of the duplication issue i.e., multiple plugin usage in which each plugin builds its own pipeline and we keep repeating the same work multiple times which could have been minimized into a standalone pipeline used by all 3 features or more in future decreasing money wastage in terms of memory and architecture.

  • I’m currently thinking of framing the project around this shared infrastructure as the core deliverable, with a few minimal consumer features (search, chat, auto-tagging) implemented mainly to validate and demonstrate the system end-to-end rather than as fully independent products.

  • Would love to hear if this direction aligns with how you’d expect these ideas to be approached, or if there are constraints in Joplin’s current architecture that would push this in a different direction.

  • I’ll be creating a separate discussion post to explore this approach in more detail and clarify a few open questions, would really appreciate any guidance or feedback there.

My Discussion Post: Design Discussion: Shared Embedding & Retrieval Infrastructure for Joplin AI Features