Natural Language Search PoC with ChatGPT

Aron Negyesi
5 min readDec 11, 2023

--

Last time we made a product tagging PoC with ChatGPT, this time I will show a PoC for Natural Language Processing for search.

Example: if you add “Show me vegan cheese from Spain”, we transform it to keyword “cheese”, diet filter vegan and country filter “Spain”, so that a simple, keyword-based search engine can process it and give the desired results.

Those more into search might ask, why do we need this? Why don’t we use vector search? You can definitely do it, but also remember that having a good vector search requires significant development, while a solution like this can be crafted in days. It is also to show the potential capabilites of LLMs.

Setup

Similarly as last time, let’s stick now to pure ChatGPT, without any additional training, any documents uploaded or assistants added. Note that those might help in finetuning, but right now we are going for a PoC only. We will stick again to an eGrocery example. This time, we can start with gpt-3.5-turbo, which does the job for this purpose. If you will have troubles within your field, try switching to 4. Whichever you choose, set temperature to 0 as for this task, we want consistency.

We will use keywords, filters (country, vegan, vegetarian, bio, gluten-free, lactose-free, sugar-free) and sorting (recommended, price-ascending, price-descending). The universe of fields, sorting etc. can be expanded, but we go with a simple PoC now.

Prompting

As the first task, I always like to give ChatGPT some context of the task:

You are a Natural Language Processing assistant for our eGrocery site’s search engine. Your task is to convert the user input into a search query with relevant keywords and filters only.

The next step is to add removing too general keywords. This task is here after many rounds of tries, as in prompts order matters:

Start by removing from the query too vague, general keywords. Examples to remove: food, foods (these are not helpful as most of our products are food), products, stuff, good etc. — these should be never among keywords.

You can see here that I start with remove these keywords and finish with ‘these shoud never be among keywords’. This might look like an overkill, but in many cases ChatGPT requires repetition.

After it, remove the keywords, which we would like to transformed to filters. E.g. country, vegan etc. You might not need to add all, figure out what is the most optimal for your setup.

In the KEYWORD field of your output, never add country names, vegan, bio, organic as keywords. Our service has the filters below, transform user queries by using these filters instead of keywords when it is possible. If you transform a keyword to a filter, remove the transformed keywords from the KEYWORD field (e.g. if you transformed ‘Italian’ in ‘Italian cheese’ to country filter, remove ‘Italian’ from keywords and have only cheese as a keyword.)

Once this is done, show ChatGPT your filters. List what you have (they should have some logical separators between them), what they mean and how you want them to be displayed. With the new capabilities of ChatGPT, you might want to try uploading them in a separate file or teach your model using them, but we are sticking to basics right now and just add them to the prompt. Note that even in the prompt, you can use your own codes or even group the filters. Don’t forget to tall ChatGPT that this is your full universe of filters, otherwise it can figure out some nice options for you. :)

FILTERS: COUNTRY: This filter is for the product’s origin. For filter values we use alpha-3 iso code of the country, separated by commas. Add ‘country:’ before the values (e.g. country: AUT, ESP). VEGAN: for vegan display ‘vegan’. VEGETARIAN: ‘vegetarian’. BIO: ‘bio’. LACTOSE-FREE: lactose-free. SUGAR-FREE: ‘sugar-free’, GLUTEN-FREE: ‘gluten-free’. In your answer you can use only these filters and sorting besides keywords.

Add the sorting options you have:

Available sorting options are recommended (default), price-ascending, price-descending.

Give the desired output format to the prompt. Note that you can add specific json etc. output formats, but you might do better with post-processing the output. It saves prompt length and reduces complexity.

Strictly use the output format and add no comments to it: KEYWORD: ; FILTERS: ; SORTING: Example: Transform ‘show me cheap bio vegan cheese from France’ KEYWORD: cheese; FILTERS: bio, vegan; SORTING: price-ascending

And as a last point, as ChatGPT tends to forget what you instructed it, remind it what you do not want to see in the KEYWORDS field. You might think that it is some overshooting, but it is not. If you don’t add it, you will have far more incorrect trasformations:

Remember: if you have vegan, vegetarian, sugar-free, gluten-free, lactose-free, bio among the filters or transformed cheap/expensive to price-ascending/descending sorting, remove keywords similar to them from the KEYWORDS field of the output. Also, if you transformed a country/nationality name to a country filter, never have them in the KEYWORDS field.

Testing

Let’s try it:

Let’s try with something more general:

Yay, we did not have something as a keyword! :)

Conclusion

As you see, the capabilities are great, even multi-sentence requests could be transformed to queries which can be handled by a simple, keyword-based search engine. Also, such a prompt can be developed pretty quickly, while developing a proper search engine handling NLP might take more time.

Some additional thoughts about the PoC:

  1. The PoC can handle simple requests only. You might want to extend this by using Booleans, splitting to multiple queries or adding some additional instructions for more complex requests. You might also need to add more context about your business, location and the date. E.g. ‘I want to buy seasonal vegetables.’ looks to be a simple request, but the current PoC fails to deliver for it. (No worries, it can be handled with some modifications.)
  2. The PoC can handle some history, but is not optimised for it. If you want to handle history properly, you need to add a couple paragraphs how it should be done and where history should be used. If you want an article handling it, let me know. Note: Never add too much history, it will lower accuracy in most cases.
  3. If you want to implement something like this, I would not run all queries through ChatGPT, but instead would add some minimum number of keywords/characters to use it before normal search. You can also use a simpler sorting prompt deciding whether you need to run the query through NLP prompt. Overall, using ChatGPT adds extra costs and increases response times, so use it only if you need to.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Aron Negyesi
Aron Negyesi

Written by Aron Negyesi

Seasoned product manager writing about product management, search, AI and many more.

No responses yet