APP data_source module 'top k' answers with specific words

Hello,

I am trying to create an app that uses the Data_source module to extract specific chunks from my sources (call transcripts) that contain a particular word. I would like to know if the app analyzes all the sources in the file before providing the results and what exactly the ‘top k’ function does.

When I increase the ‘top k’ value, I get more results, but I’m not sure if this happens because the app is instructed to return more results regardless of what it finds, or if it sets the number of sources to analyze.

Currently, only few responses return chunks containing the requested word, but most of the items answered does not have a chunk containing the requested word, regardless of the top k value, I get chunks that do not contain the word. However, I am convinced that the word appears in more chunks than just the ones that are returned as correct.

My Filters:

_fun = (env) => {
return {
parents: { in: null, not: null },
timestamp: { gt: null, lt: null } // Past 3 months range
};
}

My Query:

Only return text mentioning the word ‘Company’

Hi, we unfortunately currently don’t provide keyword search (we are working on it) so we can’t specifically retrieve chunks that contain a specific word.

The topk is the number of chunks retrieved from a semantic search, but again it’s semantic so depends on the meaning. Chunks without the specific word could still be retrieved.

Does that make sense?

Totally clear, thanks for your answer Alban !
Looks like my bot is crashing when I using the app if top k is too big. What limit shouldn’t I exceed ?

We default to 32. Not sure about the actual limit, but you will fill up the context window if you put too much!

1 Like