🚸(backend) improve users similarity search and sort results

In some edge cases, the domain part the email addresse is
longer than the name part. Users searches by email similarity
then return a lot of unsorted results.

We can improve this by being more demanding on similarity when
the query looks like an email. Sorting results by the similarity
score is also an obvious improvement.

At the moment, we still think it is good to propose results with
a weak similarity on the name part because we want to avoid
as much as possible creating duplicate users by inviting one of
is many emails, a user who is already in our database.

Fixes 399
This commit is contained in:
Samuel Paccoud - DINUM
2024-11-06 08:09:05 +01:00
committed by Samuel Paccoud
parent 50891afd05
commit 4f4951cdcd
3 changed files with 60 additions and 0 deletions

View File

@@ -69,6 +69,48 @@ def test_api_users_list_query_email():
assert user_ids == [str(nicole.id), str(frank.id)]
def test_api_users_list_query_email_matching():
"""While filtering by email, results should be filtered and sorted by similarity"""
user = factories.UserFactory()
client = APIClient()
client.force_login(user)
alice = factories.UserFactory(email="alice.johnson@example.gouv.fr")
factories.UserFactory(email="jane.smith@example.gouv.fr")
michael_wilson = factories.UserFactory(email="michael.wilson@example.gouv.fr")
factories.UserFactory(email="david.jones@example.gouv.fr")
michael_brown = factories.UserFactory(email="michael.brown@example.gouv.fr")
factories.UserFactory(email="sophia.taylor@example.gouv.fr")
response = client.get(
"/api/v1.0/users/?q=michael.johnson@example.gouv.f",
)
assert response.status_code == 200
user_ids = [user["id"] for user in response.json()["results"]]
assert user_ids == [str(michael_wilson.id)]
response = client.get("/api/v1.0/users/?q=michael.johnson@example.gouv.fr")
assert response.status_code == 200
user_ids = [user["id"] for user in response.json()["results"]]
assert user_ids == [str(michael_wilson.id), str(alice.id), str(michael_brown.id)]
response = client.get(
"/api/v1.0/users/?q=ajohnson@example.gouv.f",
)
assert response.status_code == 200
user_ids = [user["id"] for user in response.json()["results"]]
assert user_ids == [str(alice.id)]
response = client.get(
"/api/v1.0/users/?q=michael.wilson@example.gouv.f",
)
assert response.status_code == 200
user_ids = [user["id"] for user in response.json()["results"]]
assert user_ids == [str(michael_wilson.id)]
def test_api_users_list_query_email_exclude_doc_user():
"""
Authenticated users should be able to list users