Top Related Projects
:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
An Efficient ProxyPool with Getter, Tester and Server
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
Daily feed of bad IPs (with blacklist hit scores)
Quick Overview
Proxy_pool is an open-source Python project that provides a simple proxy IP pool. It automatically collects free proxy IPs from the internet, validates them, and offers an API for retrieving usable proxy IPs. The project aims to simplify the process of obtaining and managing proxy IPs for various applications.
Pros
- Automatic proxy collection and validation
- Easy-to-use API for retrieving proxy IPs
- Supports multiple proxy sources and protocols (HTTP, HTTPS, SOCKS4/5)
- Configurable and extensible architecture
Cons
- Reliability of free proxies can be inconsistent
- Limited documentation, especially for advanced usage
- Potential legal and ethical concerns when using proxy IPs without permission
- May require frequent maintenance to keep proxy sources up-to-date
Code Examples
- Retrieving a random proxy:
import requests
proxy = requests.get("http://127.0.0.1:5010/get/").json()
print(f"Random proxy: {proxy}")
- Retrieving a specific protocol proxy:
import requests
https_proxy = requests.get("http://127.0.0.1:5010/get/?type=https").json()
print(f"HTTPS proxy: {https_proxy}")
- Reporting an invalid proxy:
import requests
requests.get("http://127.0.0.1:5010/delete/?proxy=1.1.1.1:8080")
Getting Started
-
Clone the repository:
git clone https://github.com/jhao104/proxy_pool.git
-
Install dependencies:
pip install -r requirements.txt
-
Modify the
setting.py
file to configure proxy sources and other settings. -
Run the proxy pool:
python proxyPool.py schedule python proxyPool.py server
-
Access the API at
http://127.0.0.1:5010
to retrieve proxy IPs.
Competitor Comparisons
:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
Pros of haipproxy
- More advanced proxy validation and scoring system
- Supports multiple proxy sources and protocols (HTTP, HTTPS, Socks4/5)
- Includes a web interface for easier management
Cons of haipproxy
- More complex setup and configuration
- Requires additional dependencies (e.g., Scrapy, Redis)
- Less frequently updated compared to proxy_pool
Code Comparison
proxy_pool:
def check_proxy(proxy):
url = "http://www.baidu.com/get_ip.php"
try:
r = requests.get(url, proxies={"http": "http://" + proxy}, timeout=10, verify=False)
if r.status_code == 200:
return True
except:
return False
haipproxy:
def validate_proxy(proxy):
start = time.time()
try:
r = requests.get(self.target_url, proxies={"http": proxy, "https": proxy},
timeout=self.timeout, verify=False)
if r.ok:
speed = time.time() - start
return True, speed
except:
return False, None
The code comparison shows that haipproxy includes a more sophisticated proxy validation process, measuring response time and supporting both HTTP and HTTPS protocols. proxy_pool's implementation is simpler but less comprehensive.
An Efficient ProxyPool with Getter, Tester and Server
Pros of ProxyPool
- More comprehensive documentation, including detailed setup instructions and API usage examples
- Supports multiple proxy sources and validation methods out of the box
- Includes a web interface for easy management and monitoring of the proxy pool
Cons of ProxyPool
- Slightly more complex setup process due to additional dependencies
- May have higher resource usage due to more extensive features
Code Comparison
proxy_pool:
def check_proxy(proxy):
url = "http://www.baidu.com/get_ip.php"
try:
r = requests.get(url, proxies={"http": "http://" + proxy}, timeout=10, verify=False)
if r.status_code == 200:
return True
except:
return False
ProxyPool:
def check_proxy(proxy):
try:
resp = requests.get(self.test_url, proxies={
'http': 'http://' + proxy,
'https': 'https://' + proxy
}, timeout=self.timeout, verify=False)
if resp.status_code == 200:
return True
except (ProxyError, ConnectTimeout, SSLError, ReadTimeout):
return False
Both projects aim to provide a pool of usable proxies, but ProxyPool offers a more feature-rich solution with better documentation. However, this comes at the cost of a slightly more complex setup and potentially higher resource usage. The code comparison shows that ProxyPool's proxy checking function is more comprehensive, handling both HTTP and HTTPS proxies and catching specific exceptions.
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
Pros of Scylla
- Written in Rust, offering better performance and memory safety
- Supports both IPv4 and IPv6 proxies
- Provides a RESTful API for easier integration
Cons of Scylla
- Less actively maintained (last update over 2 years ago)
- Fewer built-in proxy sources compared to proxy_pool
- Limited documentation and community support
Code Comparison
proxy_pool (Python):
def get_proxy():
return self.db.pop()
def delete_proxy(proxy):
self.db.delete(proxy)
Scylla (Rust):
pub fn get_proxy(&self) -> Option<Proxy> {
self.proxies.pop_front()
}
pub fn delete_proxy(&mut self, proxy: &Proxy) {
self.proxies.retain(|p| p != proxy);
}
Both projects aim to provide a pool of proxies, but they differ in implementation language and features. proxy_pool is written in Python and offers a wider range of proxy sources, making it more flexible for various use cases. It also has more recent updates and a larger community.
Scylla, on the other hand, leverages Rust's performance benefits and provides a RESTful API, which can be advantageous for certain applications. However, its development seems to have slowed down, potentially limiting its long-term viability.
The code comparison shows similar basic functionality for retrieving and deleting proxies, with Scylla's implementation benefiting from Rust's strong typing and memory safety features.
Daily feed of bad IPs (with blacklist hit scores)
Pros of ipsum
- Focuses on IP blocklists for security purposes, providing a more specialized tool
- Regularly updated with new malicious IP addresses from various sources
- Lightweight and easy to integrate into existing security systems
Cons of ipsum
- Limited to IP blocklists, lacking the proxy pool functionality
- May require additional tools or scripts for implementation in certain use cases
- Less versatile compared to proxy_pool's broader proxy management features
Code comparison
ipsum:
#!/usr/bin/env python
import re
import sys
def addr_to_int(value):
return struct.unpack("!I", socket.inet_aton(value))[0]
proxy_pool:
class ProxyPool(object):
def __init__(self):
self.pool = set()
def add(self, proxy):
self.pool.add(proxy)
def remove(self, proxy):
self.pool.discard(proxy)
The code snippets show that ipsum focuses on IP address manipulation, while proxy_pool manages a set of proxy addresses. This reflects their different purposes: ipsum for IP blocklists and proxy_pool for proxy management.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ProxyPool ç¬è«ä»£çIPæ±
______ ______ _
| ___ \_ | ___ \ | |
| |_/ / \__ __ __ _ __ _ | |_/ /___ ___ | |
| __/| _// _ \ \ \/ /| | | || __// _ \ / _ \ | |
| | | | | (_) | > < \ |_| || | | (_) | (_) || |___
\_| |_| \___/ /_/\_\ \__ |\_| \___/ \___/ \_____\
__ / /
/___ /
ProxyPool
ç¬è«ä»£çIPæ± é¡¹ç®,主è¦åè½ä¸ºå®æ¶ééç½ä¸åå¸çå 费代çéªè¯å ¥åºï¼å®æ¶éªè¯å ¥åºç代çä¿è¯ä»£ççå¯ç¨æ§ï¼æä¾APIåCLI两ç§ä½¿ç¨æ¹å¼ãåæ¶ä½ ä¹å¯ä»¥æ©å±ä»£çæºä»¥å¢å ä»£çæ± IPçè´¨éåæ°éã
-
ææ¡£: document
-
æµè¯å°å: http://demo.spiderpy.cn (å¿å谢谢)
-
ä»è´¹ä»£çæ¨è: luminati-china. å½å¤çäº®æ°æ®BrightDataï¼ä»¥åå«luminatiï¼è¢«è®¤ä¸ºæ¯ä»£çå¸åºé¢å¯¼è ï¼è¦çå ¨çç7200ä¸IPï¼å¤§é¨åæ¯ç人ä½å® IPï¼æåçææçãä»è´¹å¥é¤å¤ç§ï¼éè¦é«è´¨é代çIPçå¯ä»¥æ³¨ååèç³»ä¸æå®¢æãç³è¯·å è´¹è¯ç¨ ç®åæ50%ææ£ä¼æ æ´»å¨ã(PS:ç¨ä¸æç½çåå¦å¯ä»¥åèè¿ä¸ªä½¿ç¨æç¨)ã
è¿è¡é¡¹ç®
ä¸è½½ä»£ç :
- git clone
git clone git@github.com:jhao104/proxy_pool.git
- releases
https://github.com/jhao104/proxy_pool/releases ä¸è½½å¯¹åºzipæä»¶
å®è£ ä¾èµ:
pip install -r requirements.txt
æ´æ°é ç½®:
# setting.py 为项ç®é
ç½®æä»¶
# é
ç½®APIæå¡
HOST = "0.0.0.0" # IP
PORT = 5000 # çå¬ç«¯å£
# é
ç½®æ°æ®åº
DB_CONN = 'redis://:pwd@127.0.0.1:8888/0'
# é
ç½® ProxyFetcher
PROXY_FETCHER = [
"freeProxy01", # è¿éæ¯å¯ç¨ç代çæåæ¹æ³åï¼ææfetchæ¹æ³ä½äºfetcher/proxyFetcher.py
"freeProxy02",
# ....
]
å¯å¨é¡¹ç®:
# 妿已ç»å
·å¤è¿è¡æ¡ä»¶, å¯ç¨éè¿proxyPool.pyå¯å¨ã
# ç¨åºå为: schedule è°åº¦ç¨åº å server Apiæå¡
# å¯å¨è°åº¦ç¨åº
python proxyPool.py schedule
# å¯å¨webApiæå¡
python proxyPool.py server
Docker Image
docker pull jhao104/proxy_pool
docker run --env DB_CONN=redis://:password@ip:port/0 -p 5010:5010 jhao104/proxy_pool:latest
docker-compose
项ç®ç®å½ä¸è¿è¡:
docker-compose up -d
使ç¨
- Api
å¯å¨webæå¡å, é»è®¤é ç½®ä¸ä¼å¼å¯ http://127.0.0.1:5010 çapiæ¥å£æå¡:
api | method | Description | params |
---|---|---|---|
/ | GET | apiä»ç» | None |
/get | GET | éæºè·åä¸ä¸ªä»£ç | å¯éåæ°: ?type=https è¿æ»¤æ¯æhttpsç代ç |
/pop | GET | è·åå¹¶å é¤ä¸ä¸ªä»£ç | å¯éåæ°: ?type=https è¿æ»¤æ¯æhttpsç代ç |
/all | GET | è·åææä»£ç | å¯éåæ°: ?type=https è¿æ»¤æ¯æhttpsç代ç |
/count | GET | æ¥çä»£çæ°é | None |
/delete | GET | å é¤ä»£ç | ?proxy=host:ip |
- ç¬è«ä½¿ç¨
ãã妿è¦å¨ç¬è«ä»£ç ä¸ä½¿ç¨çè¯ï¼ å¯ä»¥å°æ¤apiå°è£ æå½æ°ç´æ¥ä½¿ç¨ï¼ä¾å¦ï¼
import requests
def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()
def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))
# your spider code
def getHtml():
# ....
retry_count = 5
proxy = get_proxy().get("proxy")
while retry_count > 0:
try:
html = requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)})
# 使ç¨ä»£ç访é®
return html
except Exception:
retry_count -= 1
# å é¤ä»£çæ± ä¸ä»£ç
delete_proxy(proxy)
return None
æ©å±ä»£ç
ãã项ç®é»è®¤å å«å 个å è´¹ç代çè·åæºï¼ä½æ¯å è´¹çæ¯ç«è´¨éæéï¼æä»¥å¦æç´æ¥è¿è¡å¯è½æ¿å°ç代çè´¨éä¸çæ³ãæä»¥ï¼æä¾äºä»£çè·åçæ©å±æ¹æ³ã
ããæ·»å ä¸ä¸ªæ°çä»£çæºæ¹æ³å¦ä¸:
- 1ãé¦å
å¨ProxyFetcherç±»ä¸æ·»å èªå®ä¹çè·å代ççéææ¹æ³ï¼
è¯¥æ¹æ³éè¦ä»¥çæå¨(yield)å½¢å¼è¿å
host:ip
æ ¼å¼ç代çï¼ä¾å¦:
class ProxyFetcher(object):
# ....
# èªå®ä¹ä»£çæºè·åæ¹æ³
@staticmethod
def freeProxyCustom1(): # å½åä¸åå·²æéå¤å³å¯
# éè¿æç½ç«æè
ææ¥å£æææ°æ®åºè·å代ç
# åè®¾ä½ å·²ç»æ¿å°äºä¸ä¸ªä»£çå表
proxies = ["x.x.x.x:3128", "x.x.x.x:80"]
for proxy in proxies:
yield proxy
# ç¡®ä¿æ¯ä¸ªproxy齿¯ host:ipæ£ç¡®çæ ¼å¼è¿å
- 2ãæ·»å å¥½æ¹æ³åï¼ä¿®æ¹setting.pyæä»¶ä¸ç
PROXY_FETCHER
项ï¼
ããå¨PROXY_FETCHER
䏿·»å èªå®ä¹æ¹æ³çåå:
PROXY_FETCHER = [
"freeProxy01",
"freeProxy02",
# ....
"freeProxyCustom1" # # ç¡®ä¿åååä½ æ·»å æ¹æ³ååä¸è´
]
ããschedule
è¿ç¨ä¼æ¯é䏿®µæ¶é´æå䏿¬¡ä»£çï¼ä¸æ¬¡æåæ¶ä¼èªå¨è¯å«è°ç¨ä½ å®ä¹çæ¹æ³ã
å è´¹ä»£çæº
ç®åå®ç°çééå 费代çç½ç«æ(æåä¸åå å, ä¸é¢ä» æ¯å¯¹å ¶åå¸çå è´¹ä»£çæ åµ, ä»è´¹ä»£çæµè¯å¯ä»¥åèè¿é):
代çåç§° | ç¶æ | æ´æ°é度 | å¯ç¨ç | å°å | 代ç |
---|---|---|---|---|---|
ç«å¤§ç· | â | â | ** | å°å | freeProxy01 |
66代ç | â | â | * | å°å | freeProxy02 |
å¼å¿ä»£ç | â | â | * | å°å | freeProxy03 |
FreeProxyList | â | â | * | å°å | freeProxy04 |
快代ç | â | â | * | å°å | freeProxy05 |
å°å代ç | â | â â â | * | å°å | freeProxy06 |
äºä»£ç | â | â | * | å°å | freeProxy07 |
å°å¹»ä»£ç | â | â â | * | å°å | freeProxy08 |
å 费代çåº | â | â | * | å°å | freeProxy09 |
89代ç | â | â | * | å°å | freeProxy10 |
稻壳代ç | â | â â | *** | å°å | freeProxy11 |
å¦æè¿æå ¶ä»å¥½çå 费代çç½ç«, å¯ä»¥å¨æäº¤å¨issues, 䏿¬¡æ´æ°æ¶ä¼èèå¨é¡¹ç®ä¸æ¯æã
é®é¢åé¦
ããä»»ä½é®é¢æ¬¢è¿å¨Issues ä¸åé¦ï¼åæ¶ä¹å¯ä»¥å°æçå客ä¸çè¨ã
ããä½ çåé¦ä¼è®©æ¤é¡¹ç®å徿´å å®ç¾ã
è´¡ç®ä»£ç
ããæ¬é¡¹ç®ä» ä½ä¸ºåºæ¬çéç¨çä»£çæ± æ¶æï¼ä¸æ¥æ¶ç¹æåè½(å½ç¶,ä¸éäºç¹å«å¥½çidea)ã
ããæ¬é¡¹ç®ä¾ç¶ä¸å¤å®åï¼å¦æåç°bugæææ°çåè½æ·»å ï¼è¯·å¨Issuesä¸æäº¤bug(ææ°åè½)æè¿°ï¼æä¼å°½åæ¹è¿ï¼ä½¿å¥¹æ´å å®ç¾ã
ããè¿éæè°¢ä»¥ä¸contributorçæ ç§å¥ç®ï¼
ãã@kangnwh | @bobobo80 | @halleywj | @newlyedward | @wang-ye | @gladmo | @bernieyangmh | @PythonYXY | @zuijiawoniu | @netAir | @scil | @tangrela | @highroom | @luocaodan | @vc5 | @1again | @obaiyan | @zsbh | @jiannanya | @Jerry12228
Release Notes
Top Related Projects
:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
An Efficient ProxyPool with Getter, Tester and Server
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
Daily feed of bad IPs (with blacklist hit scores)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot