Dust Community Icon

Optimizing Market Research Agent for Accurate Data Scraping and Exporting

·
·

I created a market research agent for my company which does basic research from the company page ( website, case studies). to create an ICP. After this is completed I assign it to research based on vertical or website paste, to find company lookalikes. There is no integration with anything at this point but web research. The agent once it has analysed the companys, is to scrape certain companys based on potential information such as if the company has a tech stack or might need hires and then verify them to give a confidence score of sorts to determine reliability. after the agent is completed this task. it is to prepare a list or chart based on criteria, and then allow it to export to CSV to be imported into another system (clay). My questions are as follows based on the tasks explained above 1. when the agent is sent out to scrape a website (general scrape to free websites or a google search) does it actually scrape them or is this information not as valid as at first glance. 2. When exporting to CSV, the agent claims to have provided 50 results based on the criteria requested, but consistently only returns 15-20 companies at most. when instructed to continue the task it was given a few prompts conversation points earlier, it continues to create individual csvs with completely duplicate resutls. Any help in solving the 2 points would be greatly appreciated. I need to present this for a client looking at dust as a solution for this task as a POC for their use case. Thanks!

  • Avatar of Remi
    Remi
    ·
    ·

    Thanks for your question Jacek Gabanowicz! And thank you for suggesting Dust to your Client.

    1. when the agent is sent out to scrape a website (general scrape to free websites or a google search) does it actually scrape them or is this information not as valid as at first glance.

    It depends on the anti-scrapping tools used by the website. For company profile page, it usually works well. Although it can be limited.

    When exporting to CSV, the agent claims to have provided 50 results based on the criteria requested, but consistently only returns 15-20 companies at most.

    Here unfortunately, I don't have a perfect solution for you as assistants are limited in terms of what they can extract at each step. One thing you can try is to ask the assistant (in the instructions) to given the list of companies it looked into in its answer and to add in the instruction not to re-use them in the next request. But it won't be perfect 😕 One thing that could help too is to increase the max number of steps in the assistant's settings (see screenshot). I hope it helps !

  • Avatar of Jacek Gabanowicz
    Jacek Gabanowicz
    ·
    ·

    thanks Remi I'll try to add this into the agent and see what happens. Is there any recommended tools you think would be good to look at to pull as a datasource organizer that might help get better output?

  • Avatar of Remi
    Remi
    ·
    ·

    Mhh I don't think so 🤔 Web search seems to be the best to use in this case

  • Avatar of Jacek Gabanowicz
    Jacek Gabanowicz
    ·
    ·

    ok may be a case of giving clear heirarchical instructions to the agent on how to search and where.