Hello! I'm trying to create documents via API, and it appears that I have to send the document content as text. I'd like to upload pdfs directly and have dust parse them in the same way as it parses uploads via the UI - is that possible?
Hi Helen. There are 3 steps to do this:
Get a URL for the file
Upload the file to that URL
Include it as a fragment when you create a conversation
Here is some sample code in Python: https://github.com/davidebbo/llm-dust/blob/78ca508522dedc7e9afa0bc938804fb0e972f5e2/llm_dust.py#L227-L279
Thank you! A couple follow-ups:
Is it possible to use these files as part of a Dust-ui created conversation?
Can I put these files in a user-visible Folder?
Any limit to number of files I can send this way?
If you create the conv in Dust-ui, you should be able to reply to it from API using the conv ID, and include files.
You mean publicly available to anyone? Or just Dust users?
Not sure, but I would assume the API limit is the same as the limit using the Dust UI
2. Just our internal Dust users. More context: I've started manually uploading pdf files into Dust "Folders". I can then include these folders as a Datasource for Dust agents and refer to the files in conversations. I noticed the pdfs are parsed into text, but it also seems like Dust stores a copy of the original pdf file, which is helpful for reading special characters, etc. I'd like to be able to do this via API so I can avoid the manual step. Even more context: I think I can probably just push these files to a Google Drive folder instead, but I'm not sure what the Dust capability differences are. Would I be missing out on any Dust AI functionality (RAG, indexing or otherwise) if I only used Google Drive to store these pdfs?
Yes, you can upload files to Dust folders via API. Remi can probably answer the second part better than me.
Thanks Helen Deng for the question & David for the ping. You can push the files in a Gdrive folder. They will be processed exactly the same as if the file was pushed in a Dust folder (i.e: OCR to extract text, split into chunks and used for RAG)