Added comprehensive tests covering DocSpec converter service,
converter orchestration, and document creation with file uploads.
Tests validate DOCX and Markdown conversion workflows, error
handling, service availability, and edge cases including empty
files and Unicode filenames.
Signed-off-by: Stephan Meijer <me@stephanmeijer.com>
We can now import documents in formats .docx and .md.
To do so we added a new container "docspec", which
uses the docspec service to convert
these formats to Blocknote format.
More here: #1567#1569.
The template feature is removed.
Migration created to drop related tables.
Files modified:
- viewsets
- serializers
- models
- admin
- factories
- urls
- tests
- demo data
The cors-proxy endpoint was allowing redirect when fetching the target
url. This can be usefull if an image url has changed but also dangerous
if an attacker wants to hide a SSRF behind a redirect.
When the content-type return by the targeted url is not an image, the
endpoint was returning a 415 status code. We don't want to provide this
info anymore avoid disclosing information an attacker can use.
The cors-proxy endpoint allow to download images host externally without
being blocked by cors headers. The response is filter on the return
content-type to avoid disclosure and the usage of this endpoint as the
proxy used by attacker. We want to restrict the usage of this endpoint
by filtering on non legit ips used. This filter avoid exploitation of
Server Side Request Forgery (SSRF).
External dashboards need to find the latest updated documents across
the entire hierarchy. Currently this requires many API calls to
/documents/ and /documents/{id}/children for each level.
This endpoint allows retrieving all accessible documents in a single
request, enabling dashboards to efficiently display recently changed
documents regardless of their position in the hierarchy.
Signed-off-by: ChristopherSpelt <christopherspelt@icloud.com>
Keep ordering by score from Find API on search/ results and
fallback search still uses "-update_at" ordering as default
Refactor pagination to work with a list instead of a queryset
Signed-off-by: Fabre Florian <ffabre@hybird.org>
Rename FindDocumentIndexer as SearchIndexer
Rename FindDocumentSerializer as SearchDocumentSerializer
Rename package core.tasks.find as core.task.search
Remove logs on http errors in SearchIndexer
Factorise some code in search API view.
Signed-off-by: Fabre Florian <ffabre@hybird.org>
Filter deleted documents from visited ones.
Set default ordering to the Find API search call (-updated_at)
BaseDocumentIndexer.search now returns a list of document ids instead of models.
Do not call the indexer in signals when SEARCH_INDEXER_CLASS is not defined
or properly configured.
Signed-off-by: Fabre Florian <ffabre@hybird.org>
New SEARCH_INDEXER_CLASS setting to define the indexer service class.
Raise ImpoperlyConfigured errors instead of RuntimeError in index service.
Signed-off-by: Fabre Florian <ffabre@hybird.org>
On document content or permission changes, start a celery job that will call the
indexation API of the app "Find".
Signed-off-by: Fabre Florian <ffabre@hybird.org>
We observe some throttling pick here and there.
We observed that when the collaboration has a
problem, it is retrying to connect, leading to more
requests to the django backend. At one point, the
throttling is reached and the user would not
be able to use the application anymore.
Now when the request comes from a collaboration
server, we do not throttle it anymore.
We have the user full name through OIDC in the database, but the search only
used the email field.
This change allows to search for a user by their first and/or
last name (fix#929).
Given that user names are more likely than emails to include diacritics, it
unaccents both the query and the database entry for search (fix#1091).
It also unaccents for email so that internationalized domain names are
managed whether or not the accent is included in the search.
An unaccented gin index is added on users full_name an email fields.
Using a manual migration because a wrapper around unaccent is necessary
to make it IMMUTABLE (cf.
https://stackoverflow.com/questions/9063402/ )
This commit add the CRUD part to manage comment lifeycle. Permissions
are relying on the Document and Comment abilities. Comment viewset
depends on the Document route and is added to the
document_related_router. Dedicated serializer and permission are
created.
A complete API was able to manage templates lifecycle, from the creation
to the deletion and managing accesses on them. This API is not used by
the frontend application, is not finished. A connected user can interact
with this API and lead to unwanted behavior in the interface. Refering
ot issue #1222 templates can maybe totaly remove in the future. While
it's here and used, we only keep list and retrive endpoints. The
template management can still be done in the admin interface.
Like in other abilities, we compute a set_role_to property on the
abilities. This set_role_to contains all the roles lower or equal than
the current user role. We rely on this propoerty to validate the accept
endpoint and it will be used by the front allpication to built the role
select list.
We check that the role set in a ask_for_access is not higher than the
user's role accepting the request. We prevent case where ad min will
grant a user owner in order to take control of the document. Only owner
can accept an owner role.
The regex used on the version_detail endpoint path is not fully
compatible with the S3 spec. In the S3 specs, Version IDs are Unicode,
UTF-8 encoded, URL-ready, opaque strings that are no more than 1,024
bytes long. We don't accept all unicode characters but enough to be
compliant.
The trashbin endpoint is slow. To filter documents the user has owner
access, we use a subquery to compute the roles and then filter on this
subquery. This is very slow. To improve it, we use the same way to
filter children used in the tree endpoint. First we look for all highest
ancestors the user has access on with the owner role. Then we create one
queryset filtering on all the docs starting by the given path and are
deleted.
The tree endpoint will now return a result only for owners. For other
users the endpoint still returns a 403. Also, the endpoint does look for
ancestors anymore, it only stay on the current document.
Only the input data min length was checked. We also have to check the
mex length because the levenshtein dos not accept more than 254
characters and the email field has a max length of 254
This allows API users to process document content, enabling the
use of Docs as a headless CMS for instance, or any kind of document
processing. Fixes#1206.
We want to configure the throttle on all doc's viewsets. In order to
monitor them, we use the MonitoredScopedRateThrottle class and a custom
callback caputing the message in sentry at the warning level.
An editor who created a subpages should be allowed to delete it.
We change the abilities to be coherent between the creation and the
deletion.
Fixes#1193
On the cors_proxy endpoint, if the fetched url fails we were returning
an error 500. Instead, we log the exception and return a 400 to not
give back information to the frontend application.
The url used by the cors_proxy was not validated, other value than a
http url can be used. We use the built in URLValidator to validate it is
a valid url.
Once users have visited a document to which they have access,
they can't remove it from their list view anymore. Several
users reported that this is annoying because a document that
gets a lot of updates keeps popping up at the top of their list
view.
They want to be able to mask the document in a click. We propose
to add a "masked documents" section in the left side bar where the
masked documents can still be found.
Children does not have accesses created for now, they inherit from their
parent for now. We have to ignore access creation while owrk on the
children accesses has not been made.
In a first version we want to restrict the ask for access feature only
to root document. We will work on opening to all documents when iherited
permissions will be implemented.
During the multipage dev, the code base has changed a lot and rebase
after rebase it has come difficult to manage fixup commits. This commits
fix modification made that can be fixup in previous commits. The
persmission AccessPermission has been renamed in
ResourceWithAccessPermission and should be used in the
DocumentAskForAccessViewSet. A migration with the same dependency
exists, the last one is fixed. And a test didn't have removed an
abilitites.
The frontend requires this information about the ancestor document
to which each access is related. We make sure it does not generate
more db queries and does not fetch useless and heavy fields from
the document like "excerpt".
If root documents are guaranteed to have a owner, non-root documents
will automatically have them as owner by inheritance. We should not
require non-root documents to have their own direct owner because
this will make it difficult to manage access rights when we move
documents around or when we want to remove access rights for someone
on a document subtree... There should be as few overrides as possible.
This field is set only on the list view when all accesses for a given
document and all its ancestors are listed. It gives the highest role
among all accesses related to each document.
The latest refactoring in a445278 kept some factorizations that are
not legit anymore after the refactoring.
It is also cleaner to not make serializer choice in the list view if
the reason for this choice is related to something else b/c other
views would then use the wrong serializer and that would be a
security leak.
This commit also fixes a bug in the access rights inheritance: if a
user is allowed to see accesses on a document, he should see all
acesses related to ancestors, even the ancestors that he can not
read. This is because the access that was granted on all ancestors
also apply on the current document... so it must be displayed.
Lastly, we optimize database queries because the number of accesses
we fetch is going up with multi-pages and we were generating a lot
of useless queries.
We were returning the list of roles a user has on a document (direct
and inherited). Now that we introduced priority on roles, we are able
to determine what is the max role and return only this one.
This commit also changes the role that is returned for the restricted
reach: we now return None because the role is not relevant in this
case.
We are going to need to compare choices to materialize the fact that
choices are ordered. For example an admin role is higer than an
editor role but lower than an owner role.
We will need this to compute the reach and role resulting from all
the document accesses (resp. link accesses) assigned on a document's
ancestors.