Dust Community Icon

Uploading PDFs via API for document creation in Dust

·
·

Hello! I'm trying to create documents via API, and it appears that I have to send the document content as text. I'd like to upload pdfs directly and have dust parse them in the same way as it parses uploads via the UI - is that possible?

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    Hi Helen. There are 3 steps to do this:

    1. 1.

      Get a URL for the file

    2. 2.

      Upload the file to that URL

    3. 3.

      Include it as a fragment when you create a conversation

    Here is some sample code in Python: https://github.com/davidebbo/llm-dust/blob/78ca508522dedc7e9afa0bc938804fb0e972f5e2/llm_dust.py#L227-L279

  • Avatar of Helen Deng
    Helen Deng
    ·
    ·

    Thank you! A couple follow-ups:

    1. 1.

      Is it possible to use these files as part of a Dust-ui created conversation?

    2. 2.

      Can I put these files in a user-visible Folder?

    3. 3.

      Any limit to number of files I can send this way?

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·
    1. 1.

      If you create the conv in Dust-ui, you should be able to reply to it from API using the conv ID, and include files.

    2. 2.

      You mean publicly available to anyone? Or just Dust users?

    3. 3.

      Not sure, but I would assume the API limit is the same as the limit using the Dust UI

  • Avatar of Helen Deng
    Helen Deng
    ·
    ·

    2. Just our internal Dust users. More context: I've started manually uploading pdf files into Dust "Folders". I can then include these folders as a Datasource for Dust agents and refer to the files in conversations. I noticed the pdfs are parsed into text, but it also seems like Dust stores a copy of the original pdf file, which is helpful for reading special characters, etc. I'd like to be able to do this via API so I can avoid the manual step. Even more context: I think I can probably just push these files to a Google Drive folder instead, but I'm not sure what the Dust capability differences are. Would I be missing out on any Dust AI functionality (RAG, indexing or otherwise) if I only used Google Drive to store these pdfs?

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    Yes, you can upload files to Dust folders via API. Remi can probably answer the second part better than me.

  • Avatar of Remi
    Remi
    ·
    ·

    Thanks Helen Deng for the question & David for the ping. You can push the files in a Gdrive folder. They will be processed exactly the same as if the file was pushed in a Dust folder (i.e: OCR to extract text, split into chunks and used for RAG)

  • Avatar of Helen Deng
    Helen Deng
    ·
    ·

    thank you! super helpful