Hey Raphaël Gardies
Im working on a similar solution to analyze our intercom tickets.
Ive been cleaning the data upstream to keep strictly the message exchanges, the creation date and contacts.
At the end, months of Tickets will weight less than 1mo
But Im hitting another limit, which is the number of tokens.
The short term solution is performing analysis on the data with limited time-frames (i.e, up to the month).
The mid-long term solution I'm exploring is to add a step in the cleaning process to compile the messages into a summary generated by some AI.
I would then divide the context size by ten, enabling one year of analysis.