🧩 Python Script Demo
GitHub Repository
Python Script DemoRepository:
https://github.com/CafeScraper/PythonScirptDemo
Required Files (Project Root Directory)
File Overview
| File Name | Description |
|---|---|
| main.py | Script entry file (execution entry), must be named main |
| requirements.txt | Python dependency management file |
| input_schema.json | UI input form configuration file |
| README.md | Project documentation |
| sdk.py | Core SDK functionality |
| sdk_pd2.py | Enhanced data processing module |
| sdk_pd2_grpc.py | Network communication module |
⭐ Core Script SDK
📁 SDK File Description
The following three SDK files are mandatory and must be placed in the script’s root directory:
These three files together form the script’s toolbox, providing all essential capabilities required for crawler execution and interaction with the platform’s backend system.
🔧 Core Feature Usage Guide
1. Environment Parameters
Retrieve configuration passed at script startup When the script starts, external configuration parameters can be passed in (such as target website URLs, search keywords, etc.). Use the following method to retrieve them:If you need to collect data from different websites for different tasks, simply pass different parameters without modifying the script code.
2. Runtime Logging
Record script execution progress During execution, you can log messages at different levels. These logs will be displayed in the platform UI, making it easier to monitor execution status and troubleshoot issues.Log Level Explanation
- debug: Most detailed logs, recommended during development
- info: Normal execution flow, recommended at key steps
- warn: Potential issues that do not stop execution
- error: Errors that require attention
3. Returning Results
Send collected data back to the platform Once data is collected, it must be returned to the platform in two steps.Step 1: Define Table Headers (Required)
Before pushing any data, define the table structure—similar to defining column headers in Excel.Field Explanation
- label: Column name shown to users (recommended to use descriptive names)
- key: Unique identifier used in code (recommended lowercase with underscores)
- format: Data type, supported values:
"text": String / text"integer": Integer"boolean": Boolean (true / false)"array": List / array"object": Dictionary / object
Step 2: Push Data Records One by One
Important Notes
- Header definition and data pushing can be done in either order
- Data keys must exactly match the header keys (case-sensitive)
- Data must be pushed one record at a time
- Logging after each push is recommended
💡 Complete Code Example
⭐ Script Entry File (main.py)
💡 Example
Automated Data Collection Script: Operation & Principles
1. Script Overview
This is a Script for an automated tool that works like a “digital employee.”It automatically opens specified web pages (such as social media pages), extracts required information, and organizes it into structured tables.
2. Execution Workflow
The process consists of four main stages:- Receive Instructions
Input parameters such as target URLs and data limits - Stealth Preparation
Automatic proxy configuration for overseas or restricted websites - Automated Execution
Navigate pages and extract titles, content, images, etc. - Result Reporting
Push structured data and generate table headers automatically
⭐ Python Dependency Management (requirements.txt)
This file lists all third-party Python packages required to run the script.The system automatically installs all dependencies specified in this file.
Example
❗ Important Notes
1. Versioning
- Packages with versions (e.g.
beautifulsoup4==4.14.2) will be installed exactly as specified - Packages without versions will install the latest available version
2. Installation
- Dependencies are installed automatically
- Installation time depends on network speed and package size
- Errors will be displayed if installation fails
3. Ensuring Proper Execution
- grpcio and protobuf must be included (required by the SDK)
- All third-party libraries must be listed
- Core dependencies should use fixed versions
- Update dependencies regularly for security and stability
FAQ
Q: Why specify versions?A: To ensure consistent behavior across development, testing, and production environments. Q: What if I don’t specify a version?
A: The latest version will be installed, which may cause compatibility issues. Q: How do I add new dependencies?
A: Add a new line to this file and re-upload the ZIP package. Q: What if installation fails?
A: Check network connectivity or package mirrors, or contact the system administrator.
Last Updated:
Always update this dependency list whenever script functionality changes to reflect new dependencies.