Skip to main content

๐Ÿ“ Required File List (Project Root Directory)

Example using a Python project root directory:
โ”œโ”€โ”€ main.py                 # Main entry file
โ”œโ”€โ”€ requirements.txt        # Python dependency list
โ”œโ”€โ”€ README.md               # Project documentation
โ”œโ”€โ”€ input_schema.json       # UI Script configuration file
โ”œโ”€โ”€ sdk.py                  # SDK file
โ”œโ”€โ”€ sdk_pd2.py
โ”œโ”€โ”€ sdk_pd2_grpc.py

Core Entry File

  • main.py / main.js / main (depending on project type; currently supports Python, Go, and Node.js)
    • The main entry point of the crawler script
    • The file name must be main (file extension depends on the language)

Dependency Management

  • package.json (Node.js projects)
  • requirements.txt (Python projects)
  • Used to declare all dependencies required to run the project

Configuration Files

  • input_schema.json
    • UI Script configuration file
    • Defines the input form interface displayed on the platform for the script

Documentation

  • README.md
    • Script functionality documentation
    • Includes usage instructions and important notes

๐Ÿ› ๏ธ SDK Functional Modules

1. Environment Parameter Access

  • Retrieve runtime parameters passed during container startup
  • Access crawler task configurations, authentication information, and more

2. Data Storage

  • Define data table structures (headers)
  • Store scraped result data
  • Supports batch saving and resumable uploads

3. Logging Output

  • Standardized logging interfaces
  • Supports multiple log levels (INFO, WARN, ERROR, etc.)
  • Logs are automatically collected and displayed by the platform