๐ Required File List (Project Root Directory)
Core Entry File
- main.py / main.js / main (depending on project type; currently supports Python, Go, and Node.js)
- The main entry point of the crawler script
- The file name must be
main(file extension depends on the language)
Dependency Management
- package.json (Node.js projects)
- requirements.txt (Python projects)
- Used to declare all dependencies required to run the project
Configuration Files
- input_schema.json
- UI Script configuration file
- Defines the input form interface displayed on the platform for the script
Documentation
- README.md
- Script functionality documentation
- Includes usage instructions and important notes
๐ ๏ธ SDK Functional Modules
1. Environment Parameter Access
- Retrieve runtime parameters passed during container startup
- Access crawler task configurations, authentication information, and more
2. Data Storage
- Define data table structures (headers)
- Store scraped result data
- Supports batch saving and resumable uploads
3. Logging Output
- Standardized logging interfaces
- Supports multiple log levels (INFO, WARN, ERROR, etc.)
- Logs are automatically collected and displayed by the platform