Dust Community Icon

Building a Zapier Workflow to Parse PDF Reports for Dust Integration

·
·

Hey! Need some help on a workflow I’m trying to build with Zapier and Dust. Here are the steps: Each month, download a PDF report (in cc, which you’ll see contains multiple tables), upload it into our GDrive (to store it for later use) and then call a Dust agent to query the data contained in this PDF (Dust agent already created 😉) My problem is that:

  • for the step “Upload a document”, Dust is asking for the document content (which if I understand well, corresponds to the parsed content of my pdf and not the link to my doc)

What step should I include before to correctly parse the document to then make it available for the Dust agent? I saw some paid tools such as PDF.co but I’m rather looking for a “free” solution!

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    Instead of trying to upload the PDF from Google Drive to a Dust Folder (which indeed seems hard to do via Zapier), would it be simpler to have Dust directly connect to the file on Google Drive? One caveat: by default, they don't sync PDF files, but if you ask them, they can enable it on your workspace.

  • Avatar of Inès Delbecq
    Inès Delbecq
    ·
    ·

    Hi David Ebbo Thanks for your help. I tried this, but it seems like Dust is only reading docx format, and not PDF when I use GoogleDrive as search source indeed! I’ll ask for them to enable it, thanks!

  • Avatar of David Ebbo
    David Ebbo
    ·
    ·

    Yes, that's definitely the reason. Remi can you help getting PDF syncing enabled on Inès's workspace?

  • Avatar of Remi
    Remi
    ·
    ·

    Inès Delbecq can you send an email to the team at support@dust.tt? 🙏

  • Avatar of Inès Delbecq
    Inès Delbecq
    ·
    ·

    Thanks Remi Abboud activated it in my workspace, though it seems it won’t be able to parse structured data such as “tables”.... so it won’t solve my issue entirely 😞

  • Avatar of Remi
    Remi
    ·
    ·

    Inès Delbecq sorry, I read too quickly. What kind of questions would you want to ask? Do they require manipulating the table? If it does, I recommend storing the tables in a Gsheet instead and plug it to an agent using the query table tool instead. You can have one Gsheet per table (best) - all in one folder for example. Or test with multiple tabs in one doc. To get the content of the table (ready to copy/paste), you can take a screenshot and as claude to give you the table in markdown. Hope this helps! Let me know if you have any other questions!