Top Related Projects
Read-only mirror of file CVS repository, updated every half hour. NOTE: do not make pull requests here, nor comment any commits, submit them usual way to bug tracker or to the mailing list. Maintainer(s) are not tracking this git mirror.
Fast, dependency-free Go package to infer binary file types based on the magic numbers header signature
Python Imaging Library (Fork)
Quick Overview
python-magic is a Python interface to the libmagic file type identification library. It allows users to determine the type of a file based on its contents rather than its extension, providing a more reliable method of file identification.
Pros
- Easy to use and integrate into Python projects
- Provides accurate file type identification based on content
- Supports a wide range of file types
- Cross-platform compatibility (works on Unix, macOS, and Windows)
Cons
- Requires libmagic to be installed on the system
- May have performance overhead for large files or when processing many files
- Limited customization options for file type detection rules
- Occasional false positives or ambiguous results for certain file types
Code Examples
- Basic file type detection:
import magic
file_path = "example.pdf"
file_type = magic.from_file(file_path)
print(f"File type: {file_type}")
- Detecting MIME type:
import magic
file_path = "image.jpg"
mime = magic.Magic(mime=True)
mime_type = mime.from_file(file_path)
print(f"MIME type: {mime_type}")
- Detecting file type from buffer:
import magic
with open("document.docx", "rb") as f:
buffer = f.read()
m = magic.Magic()
file_type = m.from_buffer(buffer)
print(f"File type: {file_type}")
Getting Started
To use python-magic, first install it using pip:
pip install python-magic
For Windows users, additional steps may be required:
pip install python-magic-bin
Then, you can start using it in your Python code:
import magic
# Basic usage
file_type = magic.from_file("path/to/your/file")
print(f"File type: {file_type}")
# MIME type detection
mime = magic.Magic(mime=True)
mime_type = mime.from_file("path/to/your/file")
print(f"MIME type: {mime_type}")
Make sure libmagic is installed on your system. On most Unix-like systems, it's usually pre-installed. For Windows, the python-magic-bin package includes the necessary DLLs.
Competitor Comparisons
Read-only mirror of file CVS repository, updated every half hour. NOTE: do not make pull requests here, nor comment any commits, submit them usual way to bug tracker or to the mailing list. Maintainer(s) are not tracking this git mirror.
Pros of file
- Native C implementation, potentially offering better performance
- More comprehensive and actively maintained by a larger community
- Supports a wider range of file types and magic number detection
Cons of file
- Requires compilation and system-level installation
- More complex to integrate into Python projects
- Steeper learning curve for Python developers
Code Comparison
file/file (C implementation):
magic_t magic_open(int flags);
const char *magic_file(magic_t cookie, const char *filename);
int magic_load(magic_t cookie, const char *filename);
python-magic (Python wrapper):
import magic
m = magic.Magic()
file_type = m.from_file("path/to/file")
Summary
file is a powerful, native C library for file type detection, offering extensive support and performance benefits. However, it requires system-level installation and can be more challenging to integrate into Python projects.
python-magic provides a convenient Python wrapper around the libmagic library, making it easier for Python developers to use. While it may not offer the same level of performance or comprehensive file type support as the native C implementation, it provides a more accessible and Pythonic interface for many use cases.
The choice between the two depends on specific project requirements, performance needs, and the developer's familiarity with C versus Python.
Fast, dependency-free Go package to infer binary file types based on the magic numbers header signature
Pros of filetype
- Pure Python implementation, no external dependencies required
- Supports a wider range of file types, including audio and video formats
- Faster execution for file type detection
Cons of filetype
- Less comprehensive file type information compared to python-magic
- May not be as accurate for certain file types, especially less common ones
- Limited support for custom magic databases
Code Comparison
python-magic:
import magic
file_type = magic.from_file("example.pdf", mime=True)
print(file_type) # Output: application/pdf
filetype:
import filetype
kind = filetype.guess("example.pdf")
if kind is not None:
print(kind.mime) # Output: application/pdf
Both libraries provide similar functionality for basic file type detection. python-magic offers more detailed information about file contents, while filetype focuses on quick and simple file type identification. The choice between the two depends on specific project requirements, such as the need for external dependencies, execution speed, and the level of detail required in file type analysis.
Python Imaging Library (Fork)
Pros of Pillow
- Comprehensive image processing library with support for various formats and operations
- Active development with frequent updates and a large community
- Extensive documentation and examples available
Cons of Pillow
- Larger library size and potentially slower for simple file type detection
- May be overkill for projects only needing basic file type identification
Code Comparison
Pillow:
from PIL import Image
with Image.open("image.jpg") as img:
print(f"Format: {img.format}")
print(f"Size: {img.size}")
print(f"Mode: {img.mode}")
python-magic:
import magic
file_type = magic.from_file("image.jpg", mime=True)
print(f"MIME type: {file_type}")
Summary
Pillow is a powerful image processing library offering extensive functionality for working with various image formats. It's ideal for projects requiring image manipulation, resizing, or format conversion. However, for simple file type detection, python-magic provides a more lightweight solution. Pillow's code is more verbose but offers detailed image information, while python-magic provides a concise method for determining file types based on content rather than file extensions.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
python-magic
python-magic is a Python interface to the libmagic file type
identification library. libmagic identifies file types by checking
their headers according to a predefined list of file types. This
functionality is exposed to the command line by the Unix command
file
.
Usage
>>> import magic
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
# recommend using at least the first 2048 bytes, as less can produce incorrect identification
>>> magic.from_buffer(open("testdata/test.pdf", "rb").read(2048))
'PDF document, version 1.2'
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'
There is also a Magic
class that provides more direct control,
including overriding the magic database file and turning on character
encoding detection. This is not recommended for general use. In
particular, it's not safe for sharing across multiple threads and
will fail throw if this is attempted.
>>> f = magic.Magic(uncompress=True)
>>> f.from_file('testdata/test.gz')
'ASCII text (gzip compressed data, was "test", last modified: Sat Jun 28
21:32:52 2008, from Unix)'
You can also combine the flag options:
>>> f = magic.Magic(mime=True, uncompress=True)
>>> f.from_file('testdata/test.gz')
'text/plain'
Installation
The current stable version of python-magic is available on PyPI and
can be installed by running pip install python-magic
.
Other sources:
This module is a simple wrapper around the libmagic C library, and that must be installed as well:
Debian/Ubuntu
sudo apt-get install libmagic1
OSX
- When using Homebrew:
brew install libmagic
- When using macports:
port install file
If python-magic fails to load the library it may be in a non-standard location, in which case you can set the environment variable DYLD_LIBRARY_PATH
to point to it.
SmartOS:
- Install libmagic for source https://github.com/threatstack/libmagic/
- Depending on your ./configure --prefix settings set your LD_LIBRARY_PATH to
/lib
Troubleshooting
-
'MagicException: could not find any magic files!': some installations of libmagic do not correctly point to their magic database file. Try specifying the path to the file explicitly in the constructor:
magic.Magic(magic_file="path_to_magic_file")
. -
'WindowsError: [Error 193] %1 is not a valid Win32 application': Attempting to run the 32-bit libmagic DLL in a 64-bit build of python will fail with this error. Here are 64-bit builds of libmagic for windows: https://github.com/pidydx/libmagicwin64. Newer version can be found here: https://github.com/nscaife/file-windows.
-
'WindowsError: exception: access violation writing 0x00000000 ' This may indicate you are mixing Windows Python and Cygwin Python. Make sure your libmagic and python builds are consistent.
Bug Reports
python-magic is a thin layer over the libmagic C library. Historically, most bugs that have been reported against python-magic are actually bugs in libmagic; libmagic bugs can be reported on their tracker here: https://bugs.astron.com/my_view_page.php. If you're not sure where the bug lies feel free to file an issue on GitHub and I can triage it.
Running the tests
We use the tox
test runner which can be installed with python -m pip install tox
.
To run tests locally across all available python versions:
python -m tox
Or to run just against a single version:
python -m tox py
To run the tests across a variety of linux distributions (depends on Docker):
./test/run_all_docker_test.sh
libmagic python API compatibility
The python bindings shipped with libmagic use a module name that conflicts with this package. To work around this, python-magic includes a compatibility layer for the libmagic API. See COMPAT.md for a guide to libmagic / python-magic compatibility.
Versioning
Minor version bumps should be backwards compatible. Major bumps are not.
Author
Written by Adam Hupp in 2001 for a project that never got off the ground. It originally used SWIG for the C library bindings, but switched to ctypes once that was part of the python standard library.
You can contact me via my website or GitHub.
License
python-magic is distributed under the MIT license. See the included LICENSE file for details.
I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook).
Top Related Projects
Read-only mirror of file CVS repository, updated every half hour. NOTE: do not make pull requests here, nor comment any commits, submit them usual way to bug tracker or to the mailing list. Maintainer(s) are not tracking this git mirror.
Fast, dependency-free Go package to infer binary file types based on the magic numbers header signature
Python Imaging Library (Fork)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot