proxy_pool

Python ProxyPool for web spider

22,577

5,315

22,577

301

View on GitHub

Top Related Projects

haipproxy

5,490

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

ProxyPool

6,034

An Efficient ProxyPool with Getter, Tester and Server

scylla

4,012

Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era

ipsum

1,895

Daily feed of bad IPs (with blacklist hit scores)

Quick Overview

Proxy_pool is an open-source Python project that provides a simple proxy IP pool. It automatically collects free proxy IPs from the internet, validates them, and offers an API for retrieving usable proxy IPs. The project aims to simplify the process of obtaining and managing proxy IPs for various applications.

Pros

Automatic proxy collection and validation
Easy-to-use API for retrieving proxy IPs
Supports multiple proxy sources and protocols (HTTP, HTTPS, SOCKS4/5)
Configurable and extensible architecture

Cons

Reliability of free proxies can be inconsistent
Limited documentation, especially for advanced usage
Potential legal and ethical concerns when using proxy IPs without permission
May require frequent maintenance to keep proxy sources up-to-date

Code Examples

Retrieving a random proxy:

import requests

proxy = requests.get("http://127.0.0.1:5010/get/").json()
print(f"Random proxy: {proxy}")

Retrieving a specific protocol proxy:

import requests

https_proxy = requests.get("http://127.0.0.1:5010/get/?type=https").json()
print(f"HTTPS proxy: {https_proxy}")

Reporting an invalid proxy:

import requests

requests.get("http://127.0.0.1:5010/delete/?proxy=1.1.1.1:8080")

Getting Started

Clone the repository:

git clone https://github.com/jhao104/proxy_pool.git

Install dependencies:
```
pip install -r requirements.txt
```
Modify the setting.py file to configure proxy sources and other settings.

Run the proxy pool:

python proxyPool.py schedule
python proxyPool.py server

Access the API at http://127.0.0.1:5010 to retrieve proxy IPs.

Competitor Comparisons

haipproxy

5,490

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

Pros of haipproxy

More advanced proxy validation and scoring system
Supports multiple proxy sources and protocols (HTTP, HTTPS, Socks4/5)
Includes a web interface for easier management

Cons of haipproxy

More complex setup and configuration
Requires additional dependencies (e.g., Scrapy, Redis)
Less frequently updated compared to proxy_pool

Code Comparison

proxy_pool:

def check_proxy(proxy):
    url = "http://www.baidu.com/get_ip.php"
    try:
        r = requests.get(url, proxies={"http": "http://" + proxy}, timeout=10, verify=False)
        if r.status_code == 200:
            return True
    except:
        return False

haipproxy:

def validate_proxy(proxy):
    start = time.time()
    try:
        r = requests.get(self.target_url, proxies={"http": proxy, "https": proxy},
                         timeout=self.timeout, verify=False)
        if r.ok:
            speed = time.time() - start
            return True, speed
    except:
        return False, None

The code comparison shows that haipproxy includes a more sophisticated proxy validation process, measuring response time and supporting both HTTP and HTTPS protocols. proxy_pool's implementation is simpler but less comprehensive.

ProxyPool

6,034

An Efficient ProxyPool with Getter, Tester and Server

Pros of ProxyPool

More comprehensive documentation, including detailed setup instructions and API usage examples
Supports multiple proxy sources and validation methods out of the box
Includes a web interface for easy management and monitoring of the proxy pool

Cons of ProxyPool

Slightly more complex setup process due to additional dependencies
May have higher resource usage due to more extensive features

Code Comparison

proxy_pool:

def check_proxy(proxy):
    url = "http://www.baidu.com/get_ip.php"
    try:
        r = requests.get(url, proxies={"http": "http://" + proxy}, timeout=10, verify=False)
        if r.status_code == 200:
            return True
    except:
        return False

ProxyPool:

def check_proxy(proxy):
    try:
        resp = requests.get(self.test_url, proxies={
            'http': 'http://' + proxy,
            'https': 'https://' + proxy
        }, timeout=self.timeout, verify=False)
        if resp.status_code == 200:
            return True
    except (ProxyError, ConnectTimeout, SSLError, ReadTimeout):
        return False

Both projects aim to provide a pool of usable proxies, but ProxyPool offers a more feature-rich solution with better documentation. However, this comes at the cost of a slightly more complex setup and potentially higher resource usage. The code comparison shows that ProxyPool's proxy checking function is more comprehensive, handling both HTTP and HTTPS proxies and catching specific exceptions.

scylla

4,012

Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era

Pros of Scylla

Written in Rust, offering better performance and memory safety
Supports both IPv4 and IPv6 proxies
Provides a RESTful API for easier integration

Cons of Scylla

Less actively maintained (last update over 2 years ago)
Fewer built-in proxy sources compared to proxy_pool
Limited documentation and community support

Code Comparison

proxy_pool (Python):

def get_proxy():
    return self.db.pop()

def delete_proxy(proxy):
    self.db.delete(proxy)

Scylla (Rust):

pub fn get_proxy(&self) -> Option<Proxy> {
    self.proxies.pop_front()
}

pub fn delete_proxy(&mut self, proxy: &Proxy) {
    self.proxies.retain(|p| p != proxy);
}

Both projects aim to provide a pool of proxies, but they differ in implementation language and features. proxy_pool is written in Python and offers a wider range of proxy sources, making it more flexible for various use cases. It also has more recent updates and a larger community.

Scylla, on the other hand, leverages Rust's performance benefits and provides a RESTful API, which can be advantageous for certain applications. However, its development seems to have slowed down, potentially limiting its long-term viability.

The code comparison shows similar basic functionality for retrieving and deleting proxies, with Scylla's implementation benefiting from Rust's strong typing and memory safety features.

ipsum

1,895

Daily feed of bad IPs (with blacklist hit scores)

Pros of ipsum

Focuses on IP blocklists for security purposes, providing a more specialized tool
Regularly updated with new malicious IP addresses from various sources
Lightweight and easy to integrate into existing security systems

Cons of ipsum

Limited to IP blocklists, lacking the proxy pool functionality
May require additional tools or scripts for implementation in certain use cases
Less versatile compared to proxy_pool's broader proxy management features

Code comparison

ipsum:

#!/usr/bin/env python

import re
import sys

def addr_to_int(value):
    return struct.unpack("!I", socket.inet_aton(value))[0]

proxy_pool:

class ProxyPool(object):
    def __init__(self):
        self.pool = set()

    def add(self, proxy):
        self.pool.add(proxy)

    def remove(self, proxy):
        self.pool.discard(proxy)

The code snippets show that ipsum focuses on IP address manipulation, while proxy_pool manages a set of proxy addresses. This reflects their different purposes: ipsum for IP blocklists and proxy_pool for proxy management.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ProxyPool ç¬è«ä»£çIPæ±

______                        ______             _
| ___ \_                      | ___ \           | |
| |_/ / \__ __   __  _ __   _ | |_/ /___   ___  | |
|  __/|  _// _ \ \ \/ /| | | ||  __// _ \ / _ \ | |
| |   | | | (_) | >  < \ |_| || |  | (_) | (_) || |___
\_|   |_|  \___/ /_/\_\ \__  |\_|   \___/ \___/ \_____\
                       __ / /
                      /___ /

ProxyPool

ææ¡£: document
æ¯æçæ¬:
æµè¯å°å: http://demo.spiderpy.cn (å¿åè°¢è°¢)
ä»è´¹ä»£çæ¨è: luminati-china. å½å¤çäº®æ°æ®BrightDataï¼ä»¥åå«luminatiï¼è¢«è®¤ä¸ºæ¯ä»£çå¸åºé¢å¯¼èï¼è¦çå¨çç7200ä¸IPï¼å¤§é¨åæ¯çäººä½å®IPï¼æåçææçãä»è´¹å¥é¤å¤ç§ï¼éè¦é«è´¨éä»£çIPçå¯ä»¥æ³¨ååèç³»ä¸æå®¢æãç³è¯·åè´¹è¯ç¨ ç®åæ50%ææ£ä¼æ æ´»å¨ã(PS:ç¨ä¸æç½çåå¦å¯ä»¥åèè¿ä¸ªä½¿ç¨æç¨)ã

è¿è¡é¡¹ç®

ä¸è½½ä»£ç :

git clone

git clone git@github.com:jhao104/proxy_pool.git

releases

https://github.com/jhao104/proxy_pool/releases ä¸è½½å¯¹åºzipæä»¶

å®è£ä¾èµ:

pip install -r requirements.txt

æ´æ°éç½®:

# setting.py ä¸ºé¡¹ç®éç½®æä»¶

# éç½®APIæå¡

HOST = "0.0.0.0"               # IP
PORT = 5000                    # çå¬ç«¯å£


# éç½®æ°æ®åº

DB_CONN = 'redis://:pwd@127.0.0.1:8888/0'


# éç½® ProxyFetcher

PROXY_FETCHER = [
    "freeProxy01",      # è¿éæ¯å¯ç¨çä»£çæåæ¹æ³åï¼ææfetchæ¹æ³ä½äºfetcher/proxyFetcher.py
    "freeProxy02",
    # ....
]

å¯å¨é¡¹ç®:

# å¦æå·²ç»å·å¤è¿è¡æ¡ä»¶, å¯ç¨éè¿proxyPool.pyå¯å¨ã
# ç¨åºåä¸º: schedule è°åº¦ç¨åº å server Apiæå¡

# å¯å¨è°åº¦ç¨åº
python proxyPool.py schedule

# å¯å¨webApiæå¡
python proxyPool.py server

Docker Image

docker pull jhao104/proxy_pool

docker run --env DB_CONN=redis://:password@ip:port/0 -p 5010:5010 jhao104/proxy_pool:latest

docker-compose

é¡¹ç®ç®å½ä¸è¿è¡:

docker-compose up -d

ä½¿ç¨

å¯å¨webæå¡å, é»è®¤éç½®ä¸ä¼å¼å¯ http://127.0.0.1:5010 çapiæ¥å£æå¡:

api	method	Description	params
/	GET	apiä»ç»	None
/get	GET	éæºè·åä¸ä¸ªä»£ç	å¯éåæ°: `?type=https` è¿æ»¤æ¯æhttpsçä»£ç
/pop	GET	è·åå¹¶å é¤ä¸ä¸ªä»£ç	å¯éåæ°: `?type=https` è¿æ»¤æ¯æhttpsçä»£ç
/all	GET	è·åææä»£ç	å¯éåæ°: `?type=https` è¿æ»¤æ¯æhttpsçä»£ç
/count	GET	æ¥çä»£çæ°é	None
/delete	GET	å é¤ä»£ç	`?proxy=host:ip`

ç¬è«ä½¿ç¨

import requests

def get_proxy():
    return requests.get("http://127.0.0.1:5010/get/").json()

def delete_proxy(proxy):
    requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))

# your spider code

def getHtml():
    # ....
    retry_count = 5
    proxy = get_proxy().get("proxy")
    while retry_count > 0:
        try:
            html = requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)})
            # ä½¿ç¨ä»£çè®¿é®
            return html
        except Exception:
            retry_count -= 1
    # å é¤ä»£çæ± ä¸ä»£ç
    delete_proxy(proxy)
    return None

æ©å±ä»£ç

ããæ·»å ä¸ä¸ªæ°çä»£çæºæ¹æ³å¦ä¸:

1ãé¦åå¨ProxyFetcherç±»ä¸æ·»å èªå®ä¹çè·åä»£ççéææ¹æ³ï¼ è¯¥æ¹æ³éè¦ä»¥çæå¨(yield)å½¢å¼è¿åhost:ipæ ¼å¼çä»£çï¼ä¾å¦:


class ProxyFetcher(object):
    # ....

    # èªå®ä¹ä»£çæºè·åæ¹æ³
    @staticmethod
    def freeProxyCustom1():  # å½åä¸åå·²æéå¤å³å¯

        # éè¿æç½ç«æèææ¥å£æææ°æ®åºè·åä»£ç
        # åè®¾ä½ å·²ç»æ¿å°äºä¸ä¸ªä»£çåè¡¨
        proxies = ["x.x.x.x:3128", "x.x.x.x:80"]
        for proxy in proxies:
            yield proxy
        # ç¡®ä¿æ¯ä¸ªproxyé½æ¯ host:ipæ£ç¡®çæ ¼å¼è¿å

2ãæ·»å å¥½æ¹æ³åï¼ä¿®æ¹setting.pyæä»¶ä¸çPROXY_FETCHERé¡¹ï¼

ããå¨PROXY_FETCHERä¸æ·»å èªå®ä¹æ¹æ³çåå:

PROXY_FETCHER = [
    "freeProxy01",    
    "freeProxy02",
    # ....
    "freeProxyCustom1"  #  # ç¡®ä¿åååä½ æ·»å æ¹æ³ååä¸è´
]

åè´¹ä»£çæº

ä»£çåç§°	ç¶æ	æ´æ°éåº¦	å¯ç¨ç	å°å	ä»£ç
ç«å¤§ç·	â	â	**	å°å	`freeProxy01`
66ä»£ç	â	â	*	å°å	`freeProxy02`
å¼å¿ä»£ç	â	â	*	å°å	`freeProxy03`
FreeProxyList	â	â	*	å°å	`freeProxy04`
å¿«ä»£ç	â	â	*	å°å	`freeProxy05`
å°åä»£ç	â	âââ	*	å°å	`freeProxy06`
äºä»£ç	â	â	*	å°å	`freeProxy07`
å°å¹»ä»£ç	â	ââ	*	å°å	`freeProxy08`
åè´¹ä»£çåº	â	â	*	å°å	`freeProxy09`
89ä»£ç	â	â	*	å°å	`freeProxy10`
ç¨»å£³ä»£ç	â	ââ	***	å°å	`freeProxy11`

é®é¢åé¦

è´¡ç®ä»£ç

ããè¿éæè°¢ä»¥ä¸contributorçæ ç§å¥ç®ï¼

Release Notes

changelog

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of haipproxy

Cons of haipproxy

Code Comparison

Pros of ProxyPool

Cons of ProxyPool

Code Comparison

Pros of Scylla

Cons of Scylla

Code Comparison

Pros of ipsum

Cons of ipsum

Code comparison

Convert designs to code with AI

README

ProxyPool ç¬è«ä»£çIPæ±

ProxyPool

è¿è¡é¡¹ç®

ä¸è½½ä»£ç :

å®è£ ä¾èµ:

æ´æ°é ç½®:

å¯å¨é¡¹ç®:

Docker Image

docker-compose

ä½¿ç¨

æ©å±ä»£ç

å è´¹ä»£çæº

é®é¢åé¦

è´¡ç®ä»£ç 

Release Notes

Top Related Projects

Convert designs to code with AI

ProxyPool ç¬è«ä»£çIPæ±

è¿è¡é¡¹ç®

ä¸è½½ä»£ç :

å®è£ä¾èµ:

æ´æ°éç½®:

å¯å¨é¡¹ç®:

ä½¿ç¨

æ©å±ä»£ç

åè´¹ä»£çæº

é®é¢åé¦

è´¡ç®ä»£ç