Convert Figma logo to code with AI

imWildCat logoscylla

Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era

4,012
476
4,012
48

Top Related Projects

A Rust port of shadowsocks

19,230

An unidentifiable mechanism that helps you bypass GFW.

A platform for building proxies to bypass network restrictions.

28,167

Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various uses.

Make a fortune quietly

14,855

Lantern官方版本下载 蓝灯 翻墙 代理 科学上网 外网 加速器 梯子 路由 - Быстрый, надежный и безопасный доступ к открытому интернету - lantern proxy vpn censorship-circumvention censorship gfw accelerator پراکسی لنترن، ضدسانسور، امن، قابل اعتماد و پرسرعت

Quick Overview

Scylla is an open-source HTTP proxy pool for web scraping and data collection. It provides a robust and scalable solution for managing and rotating IP addresses, helping users bypass rate limits and access geo-restricted content. Scylla is designed to be easy to deploy and integrate into existing web scraping workflows.

Pros

  • Easy to deploy using Docker, with support for both x86 and ARM architectures
  • Provides a RESTful API for easy integration with various programming languages and tools
  • Supports multiple proxy providers and automatic proxy rotation
  • Includes a web-based dashboard for monitoring and managing proxies

Cons

  • Limited documentation, which may make it challenging for new users to get started
  • Requires some technical knowledge to set up and configure properly
  • May require additional configuration for optimal performance in large-scale scraping operations
  • Lacks built-in support for advanced features like session management or browser fingerprinting

Code Examples

  1. Fetching a proxy from Scylla:
import requests

proxy = requests.get('http://localhost:8899/api/v1/proxies').json()
print(f"Proxy: {proxy['ip']}:{proxy['port']}")
  1. Using a Scylla proxy with requests:
import requests

proxy_url = 'http://localhost:8899/api/v1/proxies'
proxy = requests.get(proxy_url).json()

proxies = {
    'http': f"http://{proxy['ip']}:{proxy['port']}",
    'https': f"http://{proxy['ip']}:{proxy['port']}"
}

response = requests.get('https://example.com', proxies=proxies)
print(response.text)
  1. Configuring Scrapy to use Scylla:
# In your Scrapy settings.py file
DOWNLOADER_MIDDLEWARES = {
    'scrapy_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
    'scrapy_proxy_pool.middlewares.BanDetectionMiddleware': 620,
}

PROXY_POOL_ENABLED = True
PROXY_POOL_CONFIG = {
    'url': 'http://localhost:8899/api/v1/proxies',
}

Getting Started

To get started with Scylla, follow these steps:

  1. Install Docker on your system.
  2. Run Scylla using Docker:
docker run -d -p 8899:8899 -p 8081:8081 -v /var/www/scylla:/var/www/scylla --name scylla wildcat/scylla:latest
  1. Access the web dashboard at http://localhost:8081.
  2. Use the API endpoint http://localhost:8899/api/v1/proxies to fetch proxies in your application.

For more detailed configuration and usage instructions, refer to the project's GitHub repository.

Competitor Comparisons

A Rust port of shadowsocks

Pros of shadowsocks-rust

  • Written in Rust, offering better performance and memory safety
  • More mature project with a larger community and longer development history
  • Supports multiple ciphers and protocols, providing flexibility for users

Cons of shadowsocks-rust

  • Focused solely on the Shadowsocks protocol, limiting its use cases
  • May have a steeper learning curve for users unfamiliar with Rust

Code Comparison

Scylla (Python):

async def handle_client(client_reader, client_writer):
    try:
        data = await client_reader.read(BUFFER_SIZE)
        remote_reader, remote_writer = await asyncio.open_connection(
            target_host, target_port)
        remote_writer.write(data)
        await remote_writer.drain()
        # ... (rest of the function)

shadowsocks-rust (Rust):

async fn handle_client(socket: TcpStream, method: &CipherKind) -> io::Result<()> {
    let (mut reader, mut writer) = socket.split();
    let mut cipher = method.cipher();
    let mut buf = vec![0u8; 0x3fff];
    loop {
        let n = reader.read(&mut buf).await?;
        // ... (rest of the function)

The code snippets show the different approaches and languages used in handling client connections. Scylla uses Python's asyncio for asynchronous operations, while shadowsocks-rust leverages Rust's async/await syntax and low-level networking primitives.

19,230

An unidentifiable mechanism that helps you bypass GFW.

Pros of Trojan

  • Designed specifically for bypassing GFW, offering better performance in restricted networks
  • Simpler setup and configuration process
  • Lighter resource usage, suitable for low-end devices

Cons of Trojan

  • Limited protocol support compared to Scylla's multi-protocol approach
  • Less flexibility in customization and advanced features
  • Smaller community and fewer third-party clients

Code Comparison

Trojan (server configuration):

{
    "run_type": "server",
    "local_addr": "0.0.0.0",
    "local_port": 443,
    "remote_addr": "127.0.0.1",
    "remote_port": 80,
    "password": ["password1"],
    "ssl": {
        "cert": "/path/to/certificate.crt",
        "key": "/path/to/private.key",
        "key_password": "",
        "cipher": "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384",
        "cipher_tls13": "TLS_AES_128_GCM_SHA256:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384",
        "prefer_server_cipher": true,
        "alpn": [
            "http/1.1"
        ],
        "reuse_session": true,
        "session_ticket": false,
        "session_timeout": 600,
        "plain_http_response": "",
        "curves": "",
        "dhparam": ""
    }
}

Scylla (configuration example):

proxy:
  - name: http
    type: http
    port: 8080
    listen: 0.0.0.0
  - name: socks5
    type: socks5
    port: 1080
    listen: 0.0.0.0
  - name: shadowsocks
    type: ss
    port: 8388
    listen: 0.0.0.0
    password: your_password
    method: aes-256-gcm

A platform for building proxies to bypass network restrictions.

Pros of v2ray-core

  • More comprehensive and feature-rich proxy solution
  • Supports multiple protocols and transport layers
  • Larger community and more active development

Cons of v2ray-core

  • More complex configuration and setup
  • Higher resource usage due to its extensive features
  • Steeper learning curve for new users

Code Comparison

v2ray-core (Go):

type User struct {
    Level uint32
    Email string
}

type Account struct {
    Id      string
    AlterId uint32
}

Scylla (Python):

class Proxy:
    def __init__(self, host, port):
        self.host = host
        self.port = port

    def __str__(self):
        return f"{self.host}:{self.port}"

v2ray-core is a more comprehensive proxy solution written in Go, offering support for multiple protocols and transport layers. It has a larger community and more active development compared to Scylla. However, v2ray-core is more complex to configure and set up, and it may have higher resource usage due to its extensive features.

Scylla, on the other hand, is a simpler proxy crawler and pool written in Python. It focuses on providing a straightforward solution for proxy management and is easier to set up and use. However, it may lack some of the advanced features and protocol support that v2ray-core offers.

The code comparison shows the difference in complexity and focus between the two projects, with v2ray-core handling more complex user and account structures, while Scylla provides a simpler proxy representation.

28,167

Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various uses.

Pros of Xray-core

  • More comprehensive protocol support, including VLESS, Trojan, and Shadowsocks
  • Advanced traffic routing capabilities with flexible rule-based configurations
  • Active development with frequent updates and improvements

Cons of Xray-core

  • Steeper learning curve due to more complex configuration options
  • Potentially higher resource usage for advanced features

Code Comparison

Xray-core configuration example:

{
  "inbounds": [{"port": 1080, "protocol": "socks"}],
  "outbounds": [{"protocol": "freedom"}]
}

Scylla usage example:

from scylla import Scylla

proxy = Scylla()
proxy.start()

Summary

Xray-core is a feature-rich proxy tool with extensive protocol support and advanced routing capabilities, making it suitable for complex networking scenarios. However, it may require more setup time and resources.

Scylla, on the other hand, is a simpler proxy scraper and manager focused on ease of use and quick deployment. It's more suitable for basic proxy needs and automated scraping tasks.

The choice between the two depends on the specific requirements of your project, with Xray-core being more powerful but complex, and Scylla offering simplicity and ease of use for proxy management.

Make a fortune quietly

Pros of naiveproxy

  • Built on Chromium's network stack, providing robust and up-to-date protocol support
  • Designed for better censorship resistance and traffic obfuscation
  • Supports multiple protocols including HTTP, HTTPS, and QUIC

Cons of naiveproxy

  • More complex setup and configuration compared to Scylla
  • Larger codebase and resource footprint due to Chromium dependencies
  • Limited to client-side proxy functionality, whereas Scylla offers both client and server components

Code Comparison

naiveproxy (C++):

int main(int argc, char* argv[]) {
  base::CommandLine::Init(argc, argv);
  logging::LoggingSettings settings;
  settings.logging_dest = logging::LOG_TO_SYSTEM_DEBUG_LOG;
  logging::InitLogging(settings);
  return naive::naive_main(argc, argv);
}

Scylla (Python):

def start(self):
    self.loop.run_until_complete(self._start())
    try:
        self.loop.run_forever()
    except KeyboardInterrupt:
        pass

The code snippets show the entry points for both projects. naiveproxy uses C++ and integrates with Chromium's base libraries, while Scylla is written in Python and uses asyncio for event handling.

14,855

Lantern官方版本下载 蓝灯 翻墙 代理 科学上网 外网 加速器 梯子 路由 - Быстрый, надежный и безопасный доступ к открытому интернету - lantern proxy vpn censorship-circumvention censorship gfw accelerator پراکسی لنترن، ضدسانسور، امن، قابل اعتماد و پرسرعت

Pros of Lantern

  • More comprehensive solution for internet freedom, including VPN and proxy services
  • Larger user base and community support
  • Actively maintained with regular updates and releases

Cons of Lantern

  • Closed-source components, limiting transparency and community contributions
  • More complex setup and configuration compared to Scylla
  • Potential privacy concerns due to centralized infrastructure

Code Comparison

Lantern (Go):

func (c *Client) Dial(network, addr string) (net.Conn, error) {
    return c.DialWithDialer(&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
    }, network, addr)
}

Scylla (Python):

async def create_proxy_server(host: str, port: int, **kwargs) -> ProxyServer:
    return await ProxyServer.create(host, port, **kwargs)

While both projects aim to provide internet freedom solutions, Lantern offers a more comprehensive package with VPN and proxy services, whereas Scylla focuses on proxy functionality. Lantern has a larger user base and more active development, but Scylla's open-source nature provides greater transparency. The code snippets demonstrate the different languages and approaches used in each project.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

banner_scylla Build Status codecov Documentation Status PyPI version Docker Pull Donate

Scylla

An intelligent proxy pool for humanities, to extract content from the internet and build your own Large Language Models in this new AI era.

Key features:

  • Automatic proxy ip crawling and validation
  • Easy-to-use JSON API
  • Simple but beautiful web-based user interface (eg. geographical distribution of proxies)
  • Get started with only 1 command minimally
  • Simple HTTP Forward proxy server
  • Scrapy and requests integration with only 1 line of code minimally
  • Headless browser crawling

Get started

Installation

Install with Docker (highly recommended)

docker run -d -p 8899:8899 -p 8081:8081 -v /var/www/scylla:/var/www/scylla --name scylla wildcat/scylla:latest

Install directly via pip

pip install scylla
scylla --help
scylla # Run the crawler and web server for JSON API

Install from source

git clone https://github.com/imWildCat/scylla.git
cd scylla

pip install -r requirements.txt

cd frontend
npm install
cd ..

make assets-build

python -m scylla

Usage

This is an example of running a service locally (localhost), using port 8899.

Note: You might have to wait for 1 to 2 minutes in order to get some proxy ips populated in the database for the first time you use Scylla.

JSON API

Proxy IP List

http://localhost:8899/api/v1/proxies

Optional URL parameters:

ParametersDefault valueDescription
page1The page number
limit20The number of proxies shown on each page
anonymousanyShow anonymous proxies or not. Possible values:true, only anonymous proxies; false, only transparent proxies
httpsanyShow HTTPS proxies or not. Possible values:true, only HTTPS proxies; false, only HTTP proxies
countriesNoneFilter proxies for specific countries. Format example: US, or multi-countries: US,GB

Sample result:

{
    "proxies": [{
        "id": 599,
        "ip": "91.229.222.163",
        "port": 53281,
        "is_valid": true,
        "created_at": 1527590947,
        "updated_at": 1527593751,
        "latency": 23.0,
        "stability": 0.1,
        "is_anonymous": true,
        "is_https": true,
        "attempts": 1,
        "https_attempts": 0,
        "location": "54.0451,-0.8053",
        "organization": "AS57099 Boundless Networks Limited",
        "region": "England",
        "country": "GB",
        "city": "Malton"
    }, {
        "id": 75,
        "ip": "75.151.213.85",
        "port": 8080,
        "is_valid": true,
        "created_at": 1527590676,
        "updated_at": 1527593702,
        "latency": 268.0,
        "stability": 0.3,
        "is_anonymous": true,
        "is_https": true,
        "attempts": 1,
        "https_attempts": 0,
        "location": "32.3706,-90.1755",
        "organization": "AS7922 Comcast Cable Communications, LLC",
        "region": "Mississippi",
        "country": "US",
        "city": "Jackson"
    },
    ...
    ],
    "count": 1025,
    "per_page": 20,
    "page": 1,
    "total_page": 52
}

System Statistics

http://localhost:8899/api/v1/stats

Sample result:

{
    "median": 181.2566407083,
    "valid_count": 1780,
    "total_count": 9528,
    "mean": 174.3290085201
}

HTTP Forward Proxy Server

By default, Scylla will start a HTTP Forward Proxy Server on port 8081. This server will select one proxy updated recently from the database and it will be used for forward proxy. Whenever an HTTP request comes, the proxy server will select a proxy randomly.

Note: HTTPS requests are not supported at present.

The example for curl using this proxy server is shown below:

curl http://api.ipify.org -x http://127.0.0.1:8081

You could also use this feature with requests:

requests.get('http://api.ipify.org', proxies={'http': 'http://127.0.0.1:8081'})

Web UI

Open http://localhost:8899 in your browser to see the Web UI of this project.

Proxy IP List

http://localhost:8899/

Screenshot:

screenshot-proxy-list

Globally Geographical Distribution Map

http://localhost:8899/#/geo

Screenshot:

screenshot-geo-distribution

API Documentation

Please read Module Index.

Roadmap

Please see Projects.

Development and Contribution

git clone https://github.com/imWildCat/scylla.git
cd scylla

pip install -r requirements.txt

npm install
make assets-build

Testing

If you wish to run tests locally, the commands are shown below:

pip install -r tests/requirements-test.txt
pytest tests/

You are welcomed to add more test cases to this project, increasing the robustness of this project.

Naming of This Project

Scylla is derived from the name of a group of memory chips in the American TV series, Prison Break. This project was named after this American TV series to pay tribute to it.

Help

How to install Python Scylla on CentOS7

Donation

If you find this project useful, could you please donate some money to it?

No matter how much the money is, Your donation will inspire the author to develop new features continuously! 🎉 Thank you!

The ways for donation are shown below:

GitHub Sponsor

I super appreciate if you can join my sponsors here.

https://github.com/sponsors/imWildCat

PayPal

paypal_donation

License

Apache License 2.0. For more details, please read the LICENSE file.