Dust Community Icon

Get Filenames from Knowledge Base Folder Without Re-uploading Files

·
·

Is there a way to get a list of filenames in a Knowledge base folder? I’ve uploaded ~40k of files but had some interruptions on the upload so there might be ~100 files missing. I don’t want to repush the full 40k one more time just to be sure 😅

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    I had to do something similar. Here is the code I had:

    def get_existing_dust_files():
        all_documents = set()
        offset = 0
        limit = 100
    
        while True:
            url = f"https://dust.tt/api/v1/w/{wld}/spaces/{space_id}/data_sources/{dsId}/documents?limit={limit}&offset={offset}"
            response = requests.get(url, headers={"Authorization": f"Bearer {dust_token}"})
    
            if not response.ok:
                print(f"Error fetching documents: {response.status_code} - {response.text}")
                break
    
            data = response.json()
            documents = data.get("documents", [])
    
            # If no more documents were returned, break the loop
            if not documents:
                break
    
            # Add document IDs to the set
            for doc in documents:
                all_documents.add(doc["document_id"])
    
            # Update offset for next batch
            offset += limit
    
        return all_documents
  • Avatar of Gregor
    Gregor
    ·
    ·

    oh my. but that basically pulls everything back over. Crazy. Thank you for the code, I’ guess I need to use it 😄

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    To be clear, it just pulls the metadata, not the doc content. It should be significantly faster than re-uploading everything.

  • Avatar of Gregor
    Gregor
    ·
    ·

    oh! That is good news! 😅

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    Let me know how it goes! I suspect that it will take less than a minute to list your 40k documents, and then you can easily find what's missing.

  • Avatar of Gregor
    Gregor
    ·
    ·

    testing now

  • Avatar of Gregor
    Gregor
    ·
    ·

    Pulled 40967 documents in 282 seconds

  • Avatar of Gregor
    Gregor
    ·
    ·

    it took 4 mins to get all 41k 😄

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    Ok, I was a bit optimistic, but that's probably a lot less than it took you to upload them! 😅 Might be able to make it a bit faster by getting more than 100 at a time, though there is some limit to the batch size (not sure what it is).

  • Avatar of Gregor
    Gregor
    ·
    ·

    Well, I’ll take 4 mins over 7h of upload

  • Avatar of Gregor
    Gregor
    ·
    ·

    Thank you for the info!