Paperless vs Paperless
Detailed comparison of features, pros, cons, and usage
Paperless-ng (the-paperless-project/paperless-ng) is a more actively maintained and feature-rich fork of the original Paperless project (the-paperless-project/paperless), offering improved document management capabilities and a modernized user interface, though it may have a steeper learning curve for new users.
Paperless Pros and Cons
Pros
- Document Organization: Efficiently digitizes and organizes paper documents, making them easily searchable and accessible.
- Open Source: Free to use and customize, with an active community contributing to its development.
- Automation: Utilizes OCR and tagging systems to automatically categorize and index documents.
- Self-Hosted: Offers complete control over data and privacy by allowing users to host the system on their own servers.
Cons
- Setup Complexity: Initial setup and configuration can be challenging for users without technical expertise.
- Hardware Requirements: Requires dedicated hardware or a always-on computer to run effectively.
- Learning Curve: May take time to fully understand and utilize all features and capabilities.
- Maintenance: Requires regular updates and maintenance to ensure optimal performance and security.
Paperless Pros and Cons
Pros
- Document Organization: Efficiently digitizes and organizes paper documents, making them easily searchable and accessible.
- Open Source: Free to use and customize, with an active community contributing to its development.
- Automation: Utilizes OCR and tagging systems to automatically categorize and index documents.
- Self-Hosted: Offers complete control over data and privacy by allowing users to host the system on their own servers.
Cons
- Setup Complexity: Initial setup and configuration can be challenging for users without technical expertise.
- Hardware Requirements: Requires dedicated hardware or a always-on computer to run effectively.
- Learning Curve: May take time to fully understand and utilize all features and capabilities.
- Maintenance: Requires regular updates and maintenance to ensure optimal performance and security.
Paperless Code Examples
Document Consumption
This snippet shows how Paperless processes and consumes documents:
def consume_file(self, path, override_filename=None, override_title=None,
override_correspondent_id=None, override_document_type_id=None,
override_tag_ids=None, override_created=None, override_asn=None):
document = self.try_consume_file(
path, override_filename, override_title,
override_correspondent_id, override_document_type_id,
override_tag_ids, override_created, override_asn)
if document:
return document
OCR Processing
This snippet demonstrates how Paperless performs OCR on documents:
def ocr(self, input_file, output_file, language, safe_fallback=True):
import ocrmypdf
args = [
"--language", language,
"--output-type", "pdf",
"--sidecar", output_file + ".txt",
"--skip-text",
"--deskew",
"--clean",
"--rotate-pages",
input_file,
output_file
]
Document Searching
This snippet shows how Paperless implements document searching:
class DocumentIndex(SearchIndex):
text = CharField(document=True, use_template=True)
title = EdgeNgramField(model_attr="title", boost=1.5)
content = CharField(model_attr="content")
created = DateTimeField(model_attr="created")
modified = DateTimeField(model_attr="modified")
tags = MultiValueField()
correspondent = CharField(model_attr="correspondent__name", null=True)
Paperless Code Examples
Document Consumption
This snippet shows how Paperless processes and consumes documents:
def consume_file(self, path, override_filename=None, override_title=None,
override_correspondent_id=None, override_document_type_id=None,
override_tag_ids=None, override_created=None, override_asn=None):
document = self.try_consume_file(
path, override_filename, override_title,
override_correspondent_id, override_document_type_id,
override_tag_ids, override_created, override_asn)
if document:
return document
OCR Processing
This snippet demonstrates how Paperless performs OCR on documents:
def ocr(self, input_file, output_file, language, safe_fallback=True):
import ocrmypdf
args = [
"--language", language,
"--output-type", "pdf",
"--sidecar", output_file + ".txt",
"--skip-text",
"--deskew",
"--clean",
"--rotate-pages",
input_file,
output_file
]
Document Searching
This snippet shows how Paperless implements document searching:
class DocumentIndex(SearchIndex):
text = CharField(document=True, use_template=True)
title = EdgeNgramField(model_attr="title", boost=1.5)
content = CharField(model_attr="content")
created = DateTimeField(model_attr="created")
modified = DateTimeField(model_attr="modified")
tags = MultiValueField()
correspondent = CharField(model_attr="correspondent__name", null=True)
Paperless Quick Start
Installation
-
Clone the repository:
git clone https://github.com/the-paperless-project/paperless.git cd paperless
-
Install dependencies:
pip install -r requirements.txt
-
Set up the database:
python3 manage.py migrate
-
Create a superuser:
python3 manage.py createsuperuser
Basic Usage
-
Start the Paperless server:
python3 manage.py runserver
-
Open your web browser and navigate to
http://localhost:8000
-
Log in with the superuser credentials you created earlier
-
To add a document, use the following command:
python3 manage.py document_consumer /path/to/your/document.pdf
-
Refresh the web interface to see your newly added document
Example: Searching for Documents
Once you've added some documents, you can search for them using the web interface or the command line:
python3 manage.py document_search "search term"
This will return a list of documents matching the search term.
Paperless Quick Start
Installation
-
Clone the repository:
git clone https://github.com/the-paperless-project/paperless.git cd paperless
-
Install dependencies:
pip install -r requirements.txt
-
Set up the database:
python3 manage.py migrate
-
Create a superuser:
python3 manage.py createsuperuser
Basic Usage
-
Start the Paperless server:
python3 manage.py runserver
-
Open your web browser and navigate to
http://localhost:8000
-
Log in with the superuser credentials you created earlier
-
To add a document, use the following command:
python3 manage.py document_consumer /path/to/your/document.pdf
-
Refresh the web interface to see your newly added document
Example: Searching for Documents
Once you've added some documents, you can search for them using the web interface or the command line:
python3 manage.py document_search "search term"
This will return a list of documents matching the search term.
Top Related Projects
A supercharged version of paperless: scan, index and archive all your physical documents
Pros of Paperless-ng
- Modern web-based user interface with improved document management features
- Enhanced search capabilities, including full-text search and custom fields
- Automated document classification and tagging using machine learning
Cons of Paperless-ng
- Potentially higher system requirements due to additional features
- Steeper learning curve for users familiar with the original Paperless
Code Comparison
Paperless:
# views.py
class IndexView(TemplateView):
template_name = "index.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context['documents'] = Document.objects.all()
return context
Paperless-ng:
# views.py
class IndexView(LoginRequiredMixin, TemplateView):
template_name = "index.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context['documents'] = Document.objects.filter(owner=self.request.user)
return context
The Paperless-ng code adds user authentication and filters documents by owner, improving security and multi-user support.
A community-supported supercharged document management system: scan, index and archive all your documents
Pros of paperless-ngx
- More active development and frequent updates
- Enhanced user interface with modern design
- Improved document processing and management features
Cons of paperless-ngx
- Potential compatibility issues with older paperless setups
- Steeper learning curve for users familiar with the original paperless
Code Comparison
paperless:
# Example from paperless
class Document(models.Model):
correspondent = models.ForeignKey(
Correspondent, blank=True, null=True, on_delete=models.SET_NULL
)
title = models.CharField(max_length=128, blank=True, db_index=True)
paperless-ngx:
# Example from paperless-ngx
class Document(models.Model):
correspondent = models.ForeignKey(
Correspondent, related_name="documents", on_delete=models.SET_NULL, null=True, blank=True
)
title = models.CharField(max_length=128, blank=True, db_index=True)
The code comparison shows minor differences in model definitions, with paperless-ngx including a related_name
parameter in the ForeignKey
field.
Both projects aim to provide a paperless document management system, but paperless-ngx offers a more modern and actively maintained solution. While it may require some adjustment for users of the original paperless, the improved features and ongoing development make it an attractive option for those seeking a robust paperless document management system.