The frontend application is using Vercel AI SDK and it's data stream
protocol. We decided to use the pydantic AI library to use it's vercel
ai adapter. It will make the payload validation, use AsyncIterator and
deal with vercel specification.
The frontend can fetch the retrieve endpoint just for having the title.
Everytime, the content is added and to get the content, a request is
made to the s3 bucket. A query string `without_content` can be used, if
the value is `true` then the content is not added (and not also not
fetch).
Adds a new with_descendants parameter to the doc duplication API.
The logic of the duplicate() method has been moved to a new
internal _duplicate_document() method to allow for recursion.
Adds unit tests for the new feature.
Added comprehensive tests covering DocSpec converter service,
converter orchestration, and document creation with file uploads.
Tests validate DOCX and Markdown conversion workflows, error
handling, service availability, and edge cases including empty
files and Unicode filenames.
Signed-off-by: Stephan Meijer <me@stephanmeijer.com>
We can now import documents in formats .docx and .md.
To do so we added a new container "docspec", which
uses the docspec service to convert
these formats to Blocknote format.
More here: #1567#1569.
The template feature is removed.
Migration created to drop related tables.
Files modified:
- viewsets
- serializers
- models
- admin
- factories
- urls
- tests
- demo data
Rename FindDocumentIndexer as SearchIndexer
Rename FindDocumentSerializer as SearchDocumentSerializer
Rename package core.tasks.find as core.task.search
Remove logs on http errors in SearchIndexer
Factorise some code in search API view.
Signed-off-by: Fabre Florian <ffabre@hybird.org>
Filter deleted documents from visited ones.
Set default ordering to the Find API search call (-updated_at)
BaseDocumentIndexer.search now returns a list of document ids instead of models.
Do not call the indexer in signals when SEARCH_INDEXER_CLASS is not defined
or properly configured.
Signed-off-by: Fabre Florian <ffabre@hybird.org>
New SEARCH_INDEXER_CLASS setting to define the indexer service class.
Raise ImpoperlyConfigured errors instead of RuntimeError in index service.
Signed-off-by: Fabre Florian <ffabre@hybird.org>
In the UserLightSerializer we were fallbacking on a strategy to never
have a full_name or short_name empty. We use the part of the email
befire the @. We are doing the same thing now in the main
UserSerializer.
This commit add the CRUD part to manage comment lifeycle. Permissions
are relying on the Document and Comment abilities. Comment viewset
depends on the Document route and is added to the
document_related_router. Dedicated serializer and permission are
created.
An invitation can be updated to change its role. The front use a PATCH
sending only the changed role, so the email is missing in the
InivtationSerializer.validate method. We have to check first if an email
is present before working on it.
When a document was restricted, the link role could
be updated from "link-configuration" and gives a
200 response, but the change did not
have any effect because of a restriction in
LinkReachChoices.
We added a validation step to ensure that the
link role can only be updated if the document
is not restricted.
This allows API users to process document content, enabling the
use of Docs as a headless CMS for instance, or any kind of document
processing. Fixes#1206.
In the UserlightSerializer, if the user has no short_name or full_name,
we have no info about the user. We decided to use the email identifier
and slugify it to have a little bit information.
The frontend needs to know what to display on an access. The maximum
role between the access role and the role equivalent to all accesses
on the document's ancestors should be computed on the backend.
The frontend requires this information about the ancestor document
to which each access is related. We make sure it does not generate
more db queries and does not fetch useless and heavy fields from
the document like "excerpt".
This field is set only on the list view when all accesses for a given
document and all its ancestors are listed. It gives the highest role
among all accesses related to each document.
The latest refactoring in a445278 kept some factorizations that are
not legit anymore after the refactoring.
It is also cleaner to not make serializer choice in the list view if
the reason for this choice is related to something else b/c other
views would then use the wrong serializer and that would be a
security leak.
This commit also fixes a bug in the access rights inheritance: if a
user is allowed to see accesses on a document, he should see all
acesses related to ancestors, even the ancestors that he can not
read. This is because the access that was granted on all ancestors
also apply on the current document... so it must be displayed.
Lastly, we optimize database queries because the number of accesses
we fetch is going up with multi-pages and we were generating a lot
of useless queries.
On a document, we need to display the status of the link (reach and
role) taking into account the ancestors link reach/role as well as
the current document.
We were returning the list of roles a user has on a document (direct
and inherited). Now that we introduced priority on roles, we are able
to determine what is the max role and return only this one.
This commit also changes the role that is returned for the restricted
reach: we now return None because the role is not relevant in this
case.
We are going to need to compare choices to materialize the fact that
choices are ordered. For example an admin role is higer than an
editor role but lower than an owner role.
We will need this to compute the reach and role resulting from all
the document accesses (resp. link accesses) assigned on a document's
ancestors.
The document accesses a user have on a document's ancestors also apply
to this document. The frontend needs to list them as "inherited" so we
need to add them to the list.
Adding a "document_id" field on the output will allow the frontend to
differentiate between inherited and direct accesses on a document.
When a document is updated, users not connected to the collaboration
server can override work made by other people connected to the
collaboration server. To avoid this, the priority is given to user
connected to the collaboration server. If the websocket property in the
request payload is missing or set to False, the backend fetch the
collaboration server to now if the user can save or not. If users are
already connected, the user can't save. Also, only one user without
websocket can save a connect, the first user saving acquire a lock and
all other users can't save.
To implement this behavior, we need to track all users, connected and
not, so a session is created for every user in the
ForceSessionMiddleware.
Renamed the `convert_markdown` method to `convert` to prepare for an
all-purpose conversion endpoint, enabling support for multiple formats
and simplifying future extension.
Signed-off-by: Stephan Meijer <me@stephanmeijer.com>
Add the action accepting a request to access a document. It is possible
to override the role from the request and also update an existing
DocumentAccess
We introduce a new model for user wanted to access a document or upgrade
their role if they already have access.
The viewsets does not implement PUT and PATCH, we don't need it for now.
We added the possibility to scan all uploaded files with an anti malware
solution. Depending the backend used, we want to give the possibility to
check the file mimtype to determine if this one is tagged as unsafe or
not. To this you can set the environment variable
DOCUMENT_ATTACHMENT_CHECK_UNSAFE_MIME_TYPES_ENABLED to False. The
default value is True.
We recently extract images url in the content. For this, we assume that
the document content is always in base64. We enforce this assumption by
checking if it's a valide base64 in the serializer.
Every user having an access to a document, no matter its role have
access to the entire accesses list with all the user details. Only
owner or admin should be able to have the entire list, for the other
roles, they have access to the list containing only owner and
administrator with less information on the username. The email and its
id is removed
We can't prevent document editors from copy/pasting content to from one
document to another. The problem is that copying content, will copy the
urls pointing to attachments but if we don't do anything, the reader of
the document to which the content is being pasted, may not be allowed to
access the attachment files from the original document.
Using the work from the previous commit, we can grant access to the readers
of the target document by extracting the attachment keys from the content and
adding themto the target document's "attachments" field. Before doing this,
we check that the current user can indeed access the attachment files extracted
from the content and that they are allowed to edit the current document.
We took this opportunity to refactor the way access is controlled on
media attachments. We now add the media key to a list on the document
instance each time a media is uploaded to a document. This list is
passed along when a document is duplicated, allowing us to grant
access to readers on the new document, even if they don't have or
lost access to the original document.
We also propose an option to reproduce the same access rights on the
duplicate document as what was in place on the original document.
This can be requested by passing the "with_accesses=true" option in
the query string.
The tricky point is that we need to extract attachment keys from the
existing documents and set them on the new "attachments" field that is
now used to track access rights on media files.
The "nb_accesses" field was displaying the number of access instances
related to a document or any of its ancestors. Some features on the
frontend require to know how many of these access instances are related
to the document directly.
We want to be able to make a search query inside a hierchical document.
It's elegant to do it as a document detail action so that we benefit
from access control.
the "filter_queryset" method is called in the middle of the
"get_object" method. We use the "get_object" in actions like
"children", "tree", etc. which start by calling "get_object"
but return lists of documents.
We would like to apply filters to these views but the it didn't
work because the "get_object" method was also impacted by the
filters...
In a future PR, we should take control of the "get_object" method
and decouple all this. We need a quick solution to allow releasing
the hierchical documents feature in the frontend.
We want to display the tree structure to which a document belongs
on the left side panel of its detail view. For this, we need an
endpoint to retrieve the list view of the document's ancestors
opened.
By opened, we mean that when display the document, we also need to
display its siblings. When displaying the parent of the current
document, we also need to display the siblings of the parent...
- allow the language attribute on the user to be updated via API
- add frontend function to update the user language via API
- extend defaults on the test users, to have fixed language in E2E tests
- extend types and variables using the types with the new field
On the media upload endpoint, we want to set the content-disposition
header. Its value is based on the uploaded file mime-type and if flagged
as unsafe. If the file is not an image or is unsafe then the
contentDisposition is set to attachment to force its download.
Otherwise, we set it to inline.