The search feature would be greatly enhanced if attached files like PDFs were also searched.
This seems like a critical feature.
This is a feature I'm missing a lot as well. Not talking about OCR, but for Joplin to search whatever text is already present in an attached PDF-document.
Just searching the PDF index only, would already be a great improvement. This is probably simpler than a full pdf search?
As far as I can tell, Joplin does not even search inside a plain text file attached. (tried simple search, and "goto anything" on desktop v1.5.11, no luck)
That would be the first step, don't you think ?
For such an action, the file content (extracted text) would have to be included in the database / metadata. Only then would such a search be possible.
Laurent would have to comment on something like this, whether such a function would be implemented or not.
It is my understanding that Jop (by default, simple drag-n-drop) adds attached files to its DB, not just a link to an external file. In this case it's a matter of search scope not changes to the DB.
PS: you can use opt/alt while dragging a file to your note, these alone wouldn't be searchable
No, only the metadata. The file it self is in the resource forld. But for performante search the text must be pre extracted.
There exist many open source document search engines, for instance http://docfetcher.sourceforge.net/en/index.html.
Maybe, part of this code could be used to build an index for all attachments, whenever an attachment changes. The Joplin search engine can then check the index for a specific GUID against the search expression.
The indexer may be an independent piece of code, fully standalone are triggered by a Joplin API AttachmentChangeEvent. It may be written by an independent developer and be extended whenever a new doc type is required, without affecting Joplin.
I would like to vote for this feature as well. I create and capture a large number of notes, but I have many others that are lightly noted and tagged but which are used to store important documents as PDFs. If I could search in those, as I do in Evernote, it would be very helpful.
Still getting used to Joplin, but so far I'm finding it to be amazing. Congrats to everyone contributing to this FOSS project.
its a long time now but there are any news?
There's this option:
This would be huge, and would allow me to fully replace other tools with Joplin.
For me, only PDF text search is needed. OCR or image search is nice to have, but I can do that ahead of time if needed.
One workaround that could be implemented is a tool or human process that extracts the text from a PDF, and dumps it into a note with the PDF attached. I have seen this option work well for paperless orgs pre-evernote. This could even potentially be automated via the API.
This doesn't help people who already have a ton of PDFs in Joplin, of course.