Get Filenames from Knowledge Base Folder Without Re-uploading Files

Is there a way to get a list of filenames in a Knowledge base folder? I’ve uploaded ~40k of files but had some interruptions on the upload so there might be ~100 files missing. I don’t want to repush the full 40k one more time just to be sure 😅

11 comments

David Ebbo

I had to do something similar. Here is the code I had:

def get_existing_dust_files():
    all_documents = set()
    offset = 0
    limit = 100

    while True:
        url = f"https://dust.tt/api/v1/w/{wld}/spaces/{space_id}/data_sources/{dsId}/documents?limit={limit}&offset={offset}"
        response = requests.get(url, headers={"Authorization": f"Bearer {dust_token}"})

        if not response.ok:
            print(f"Error fetching documents: {response.status_code} - {response.text}")
            break

        data = response.json()
        documents = data.get("documents", [])

        # If no more documents were returned, break the loop
        if not documents:
            break

        # Add document IDs to the set
        for doc in documents:
            all_documents.add(doc["document_id"])

        # Update offset for next batch
        offset += limit

    return all_documents

Gregor
·
·
oh my. but that basically pulls everything back over. Crazy. Thank you for the code, I’ guess I need to use it 😄
David Ebbo
·
·
To be clear, it just pulls the metadata, not the doc content. It should be significantly faster than re-uploading everything.
Gregor
·
·
oh! That is good news! 😅
David Ebbo
·
·
Let me know how it goes! I suspect that it will take less than a minute to list your 40k documents, and then you can easily find what's missing.
Gregor
·
·
testing now
Gregor
·
·
Pulled 40967 documents in 282 seconds
Gregor
·
·
it took 4 mins to get all 41k 😄
David Ebbo
·
·
Ok, I was a bit optimistic, but that's probably a lot less than it took you to upload them! 😅 Might be able to make it a bit faster by getting more than 100 at a time, though there is some limit to the batch size (not sure what it is).
Gregor
·
·
Well, I’ll take 4 mins over 7h of upload
Gregor
·
·
Thank you for the info!
👍1

Get Filenames from Knowledge Base Folder Without Re-uploading Files | Dust Community