Searching attached files

The search feature would be greatly enhanced if attached files like PDFs were also searched.

11 Likes

This seems like a critical feature.

1 Like

This is a feature I'm missing a lot as well. Not talking about OCR, but for Joplin to search whatever text is already present in an attached PDF-document.

3 Likes

Just searching the PDF index only, would already be a great improvement. This is probably simpler than a full pdf search?

3 Likes

As far as I can tell, Joplin does not even search inside a plain text file attached. (tried simple search, and "goto anything" on desktop v1.5.11, no luck)
That would be the first step, don't you think ?

2 Likes

For such an action, the file content (extracted text) would have to be included in the database / metadata. Only then would such a search be possible.
Laurent would have to comment on something like this, whether such a function would be implemented or not.

1 Like

It is my understanding that Jop (by default, simple drag-n-drop) adds attached files to its DB, not just a link to an external file. In this case it's a matter of search scope not changes to the DB.
PS: you can use opt/alt while dragging a file to your note, these alone wouldn't be searchable

No, only the metadata. The file it self is in the resource forld. But for performante search the text must be pre extracted.

1 Like

There exist many open source document search engines, for instance http://docfetcher.sourceforge.net/en/index.html.
Maybe, part of this code could be used to build an index for all attachments, whenever an attachment changes. The Joplin search engine can then check the index for a specific GUID against the search expression.
The indexer may be an independent piece of code, fully standalone are triggered by a Joplin API AttachmentChangeEvent. It may be written by an independent developer and be extended whenever a new doc type is required, without affecting Joplin.

2 Likes

I would like to vote for this feature as well. I create and capture a large number of notes, but I have many others that are lightly noted and tagged but which are used to store important documents as PDFs. If I could search in those, as I do in Evernote, it would be very helpful.

Still getting used to Joplin, but so far I'm finding it to be amazing. Congrats to everyone contributing to this FOSS project.

1 Like

its a long time now but there are any news?
Hartmut

1 Like

There's this option:

1 Like

This would be huge, and would allow me to fully replace other tools with Joplin.

For me, only PDF text search is needed. OCR or image search is nice to have, but I can do that ahead of time if needed.

One workaround that could be implemented is a tool or human process that extracts the text from a PDF, and dumps it into a note with the PDF attached. I have seen this option work well for paperless orgs pre-evernote. This could even potentially be automated via the API.

This doesn't help people who already have a ton of PDFs in Joplin, of course.

3 Likes

Searching PDFs would be a must-have requirement for me to be able to replace Evernote completely.

1 Like

@wexsoft, have a look at the Resource Search Plugin. When installed, it is available under the Tools > Resource Search menu.

It can search through all text based PDFs (does not OCR images) in your notes. It is very nicely implemented by @roman_r_m (edited: mistake in attributing wrong developer)

Thanks very much for your reply @johano. I had seen a link to that plugin earlier in the conversation, but wasn't sure if it was functional. I've just installed it now and given it a try and it does seem to work very well.

what do you mean? I haven't implemented anything like that. Maybe there will be some attempts later, but not now.

Apologies, @rxliuli, the author of the plugin was in fact @roman_r_m, my mistake ¯_(ツ)_/¯