- Engineering Diary
- Posts
- Grammarly suggests Personalized Sentences without storing 🤔
Grammarly suggests Personalized Sentences without storing 🤔
Good Morning! This is Pranjal from Engineering Diary.
Here's what I got for you today:
How Grammarly suggests Personalized Sentences without Storing them
Glossary:Personalized Sentences: Sentences which are often used by a UserLSH: Locality-Sensitive Hashing (For similar texts, similar hashes are produced)
Grammarly, the popular writing assistant wanted to recommend new snippets based on the user's writing habit. They wanted to suggest user to create a snippet, when a similar sentence was frequently typed in the past.
There were two problems:
How to suggest snippets when users write same things differently. For example, they write:
Please let me know how it goes
Let me know how it went
And if a user then types 'Let me know how it goes' next time, they want to suggest creating the snippet.
How to suggest the snippets without storing the sentences against the user
Solution:
To accomplish above, Grammarly used a hashing algorithm called LSH (Locality-Sensitive Hashing). With LSH, similar sentences will produce close hashes which allows for accurate suggestions even when the sentences are written differently.
For LSH implementation, they chose SimHash as its faster and for similarity metric between two hashes, they used Hamming distance.
Implementation:
When a user is writing, they took the SimHash for all the sentences, created a tuple like (userId, [hash1, hash2, hash3]) and sent to Kafka which stores it to their long-term storage.
A Spark job (Map-Reduce job) runs daily which takes all the sentence hashes produced in the last week and compares each other by hamming distance. If a hamming distance is less than the predefined threshold, it means they are similar.
Spark job finds top n sentence hashes which has more than m similar hashes.
Those n hashes are then saved in Redis with format like {Key: userId, Value: [hash1, hash2, hash3]}.
Now while user is actively writing, each sentences are SimHashed. If a similar hash is present in Redis for that particular user, they recommend to create the snippet.
Outcome:
The approach was successful, with an increase in snippet creation compared to users who never had the recommendation in the first place. However, they had to develop a workaround for long texts which get edited over a course of a week, as the same snippet would have been processed every day. Therefore, they excluded tools like Pages and Google Docs from their list of document types.
That's a wrap for today. Stay thirsty & see ya soon!
If you have any suggestions or questions, I would love to hear from you.
Please share with your friends and colleagues.
Reply