paperless

Scan, index, and archive all of your paper documents

7,879

500

7,879

177

View on GitHub

Top Related Projects

paperless

7,879

Scan, index, and archive all of your paper documents

paperless-ng

5,389

A supercharged version of paperless: scan, index and archive all your physical documents

paperless-ngx

28,602

A community-supported supercharged document management system: scan, index and archive all your documents

Quick Overview

Paperless is an open-source document management system that helps you scan, index, and archive all your physical documents. It transforms your paper documents into searchable digital files, making it easier to organize and retrieve information. The project aims to reduce paper clutter and improve document accessibility.

Pros

Automates document scanning, OCR, and indexing processes
Provides a web-based interface for easy document management and searching
Supports tagging and custom metadata for better organization
Integrates with various scanners and mobile apps for document capture

Cons

Initial setup can be complex for non-technical users
Requires ongoing maintenance and backups to ensure data safety
May have a learning curve for users unfamiliar with document management systems
Limited built-in integrations with other productivity tools

Getting Started

To get started with Paperless, follow these steps:

Install Docker and Docker Compose on your system.
Create a docker-compose.yml file with the following content:

version: "3.4"
services:
  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    ports:
      - 8000:8000
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_URL: http://localhost:8000
      PAPERLESS_SECRET_KEY: your_secret_key_here
      PAPERLESS_TIME_ZONE: America/New_York

Run docker-compose up -d to start Paperless.
Access the web interface at http://localhost:8000 and create an admin account.
Configure your scanner or mobile app to send documents to the consume directory.

For more detailed instructions and configuration options, refer to the official Paperless documentation.

Competitor Comparisons

paperless

7,879

Scan, index, and archive all of your paper documents

Pros of paperless

Established project with a longer history and larger community
More comprehensive documentation and user guides
Wider range of features and integrations

Cons of paperless

Potentially more complex setup and configuration
May have more legacy code and technical debt
Slower release cycle for new features and updates

Code comparison

paperless:

class Document(models.Model):
    title = models.CharField(max_length=128, blank=True, db_index=True)
    content = models.TextField(blank=True)
    created = models.DateTimeField(default=timezone.now, db_index=True)
    modified = models.DateTimeField(auto_now=True)

paperless-ng:

class Document(models.Model):
    title = models.CharField(max_length=128, blank=True, db_index=True)
    content = models.TextField(blank=True)
    created = models.DateTimeField(default=timezone.now, db_index=True)
    modified = models.DateTimeField(auto_now=True)
    mime_type = models.CharField(max_length=256, null=True, blank=True)

The code comparison shows that both projects have similar core document models, with paperless-ng adding a mime_type field for improved file type handling.

paperless-ng

5,389

A supercharged version of paperless: scan, index and archive all your physical documents

Pros of paperless-ng

Improved user interface with a modern, responsive design
Enhanced document processing with better OCR and tagging capabilities
More frequent updates and active development

Cons of paperless-ng

Potentially less stable due to more frequent changes
May require more system resources for advanced features

Code Comparison

paperless:

def index(request):
    if request.method == "POST":
        form = UploadForm(request.POST, request.FILES)
        if form.is_valid():
            form.save()
            return redirect("documents:index")
    else:
        form = UploadForm()
    return render(request, "documents/index.html", {"form": form})

paperless-ng:

class IndexView(LoginRequiredMixin, TemplateView):
    template_name = "index.html"

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        context['documents'] = Document.objects.all()
        return context

The code comparison shows that paperless-ng uses class-based views and more modern Django practices, while paperless uses function-based views. This reflects the overall modernization efforts in paperless-ng.

paperless-ngx

28,602

A community-supported supercharged document management system: scan, index and archive all your documents

Pros of paperless-ngx

More active development and frequent updates
Enhanced user interface with modern design
Improved search functionality and document management features

Cons of paperless-ngx

Potentially less stable due to rapid development
May require more system resources for advanced features

Code Comparison

paperless:

def index(request):
    documents = Document.objects.all()
    return render(request, 'documents/index.html', {'documents': documents})

paperless-ngx:

def index(request):
    documents = Document.objects.annotate(
        tag_count=Count('tags'),
        correspondent_name=F('correspondent__name')
    ).all()
    return render(request, 'documents/index.html', {'documents': documents})

The paperless-ngx code snippet shows more advanced querying with annotations, potentially improving performance and reducing database calls.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

[ en | de | el ]

Important news about the future of this project

It's been more than 5 years since I started this project on a whim as an effort to try to get a handle on the massive amount of paper I was dealing with in relation to various visa applications (expat life is complicated!) Since then, the project has exploded in popularity, so much so that it overwhelmed me and working on it stopped being "fun" and started becoming a serious source of stress.

In an effort to fix this, I created the Paperless GitHub organisation, and brought on a few people to manage the issue and pull request load. Unfortunately, that model has proven to be unworkable too. With 23 pull requests waiting and 157 issues slowly filling up with confused/annoyed people wanting to get their contributions in, my whole "appoint a few strangers and hope they've got time" idea is showing my lack of foresight and organisational skill.

In the shadow of these difficulties, a fork called Paperless-ng written by Jonas Winkler has cropped up. It's really good, and unlike this project, it's actively maintained (at the time of this writing anyway). With 564 forks currently tracked by GitHub, I suspect there are a few more forks worth looking into out there as well.

So, with all of the above in mind, I've decided to archive this project as read-only and suggest that those interested in new updates or submitting patches have a look at Paperless-ng. If you really like "Old Paperless", that's ok too! The project is GPL licensed, so you can fork it and run it on whatever you like so long as you respect the terms of said license.

In time, I may transfer ownership of this organisation to Jonas if he's interested in taking that on, but for the moment, he's happy to run Paperless-ng out of its current repo. Regardless, if we do decide to make the transfer, I'll post a notification here a few months in advance so that people won't be surprised by new code at this location.

For my part, I'm really happy & proud to have been part of this project, and I'm sorry I've been unable to commit more time to it for everyone. I hope you all understand, and I'm really pleased that this work has been able to continue to live and be useful in a new project. Thank you to everyone who contributed, and for making Free software awesome.

Sincerely, Daniel Quinn

Index and archive all of your scanned paper documents

I hate paper. Environmental issues aside, it's a tech person's nightmare:

There's no search feature
It takes up physical space
Backups mean more paper

In the past few months I've been bitten more than a few times by the problem of not having the right document around. Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper. I wrote this to make my life easier.

How it Works

Paperless does not control your scanner, it only helps you deal with what your scanner produces

Buy a document scanner that can write to a place on your network. If you need some inspiration, have a look at the scanner recommendations page.
Set it up to "scan to FTP" or something similar. It should be able to push scanned images to a server without you having to do anything. Of course if your scanner doesn't know how to automatically upload the file somewhere, you can always do that manually. Paperless doesn't care how the documents get into its local consumption directory.
Have the target server run the Paperless consumption script to OCR the file and index it into a local database.
Use the web frontend to sift through the database and find what you want.
Download the PDF you need/want via the web interface and do whatever you like with it. You can even print it and send it as if it's the original. In most cases, no one will care or notice.

Here's what you get:

The before and after

Documentation

It's all available on ReadTheDocs.

Requirements

This is all really a quite simple, shiny, user-friendly wrapper around some very powerful tools.

ImageMagick converts the images between colour and greyscale.
Tesseract does the character recognition.
Unpaper despeckles and deskews the scanned image.
GNU Privacy Guard is used as the encryption backend.
Python 3 is the language of the project.
- Pillow loads the image data as a python object to be used with PyOCR.
- PyOCR is a slick programmatic wrapper around tesseract.
- Django is the framework this project is written against.
- Python-GNUPG decrypts the PDFs on-the-fly to allow you to download unencrypted files, leaving the encrypted ones on-disk.

Project Status

This project has been around since 2015, and there's lots of people using it. For some reason, it's really popular in Germany -- maybe someone over there can clue me in as to why?

I am no longer doing new development on Paperless as it does exactly what I need it to and have since turned my attention to my latest project, Aletheia. However, I'm not abandoning this project. I am happy to field pull requests and answer questions in the issue queue. If you're a developer yourself and want a new feature, float it in the issue queue and/or send me a pull request! I'm happy to add new stuff, but I just don't have the time to do that work myself.

Affiliated Projects

Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:

Paperless App: An Android/iOS app for Paperless.
Paperless Desktop: A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
ansible-role-paperless: An easy way to get Paperless running via Ansible.
paperless-cli: A golang command line binary to interact with a Paperless instance.

Similar Projects

There's another project out there called Mayan EDMS that has a surprising amount of technical overlap with Paperless. Also based on Django and using a consumer model with Tesseract and Unpaper, Mayan EDMS is much more featureful and comes with a slick UI as well, but still in Python 2. It may be that Paperless consumes fewer resources, but to be honest, this is just a guess as I haven't tested this myself. One thing's for certain though, Paperless is a way better name.

Important Note

Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. While Paperless encrypts the original files via the consumption script, the OCR'd text is not encrypted and is therefore stored in the clear (it needs to be searchable, so if someone has ideas on how to do that on encrypted data, I'm all ears). This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

Donations

As with all Free software, the power is less in the finances and more in the collective efforts. I really appreciate every pull request and bug report offered up by Paperless' users, so please keep that stuff coming. If however, you're not one for coding/design/documentation, and would like to contribute financially, I won't say no ;-)

The thing is, I'm doing ok for money, so I would instead ask you to donate to the United Nations High Commissioner for Refugees. They're doing important work and they need the money a lot more than I do.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot