2025-11-26 04:58:03 +00:00
2025-11-26 04:45:58 +00:00
IP
2025-11-25 23:47:21 +00:00
2025-11-25 22:55:02 +00:00
2025-11-25 22:55:02 +00:00
2025-11-26 04:58:03 +00:00

COS498 HW3: PDF Document Management System

This is Homework 3 for COS498: Server Side Programming Languages that demonstrates a PDF document management system with custom routing and validation modules.

Project Overview

This project implements a PDF document management system with the following features:

  • Frontend: Nginx serving static files and Handlebars templates
  • Backend: Node.js/Express server with custom routing and PDF management
  • PDF Management: PDF validation and secure serving system
  • Security: Path validation and access control for PDF serving
  • Database Integration: SQLite database for book and chapter metadata
  • Containerization: Docker containers orchestrated with Docker Compose

Features

PDF Document Management

  • PDF Validation Module: Comprehensive validation before serving any PDF
  • Custom Routing: Dedicated routing module for book and chapter navigation
  • Security Controls: Path validation and access restrictions
  • Database Integration: SQLite database with books and chapters metadata

PDF Validation System

  • File Existence Checks: Validates PDF files exist before serving
  • Path Security: Prevents access outside designated directories
  • Input Validation: Sanitizes book names and filenames
  • Extension Validation: Only allows .pdf files
  • File Size Limits: Enforces maximum file size restrictions
  • Error Responses: Appropriate HTTP status codes (400, 403, 404, 413, 500)

User Interface

  • Book Browser: Navigate through available books and chapters
  • PDF Viewer: Embedded PDF viewing with iframe integration
  • Responsive Design: Mobile-friendly layout with consistent styling
  • Navigation: Breadcrumb navigation and book/chapter selection

Prerequisites

Before running this project, ensure you have the following installed:

Project Structure

COS498-HW3/
├── docker-compose.yml          # Docker Compose configuration
├── README.md                   # This file
├── .git/                       # Git repository metadata
├── .gitmodules                 # Git submodules configuration
├── backend/
│   ├── Dockerfile             # Backend container configuration
│   ├── package.json           # Node.js dependencies and scripts
│   ├── package-lock.json      # Locked dependency versions
│   ├── server.js              # Node.js Express server with PDF management
│   ├── database/
│   │   ├── backend.db         # SQLite database with book metadata
│   │   ├── backend.schema     # Database schema definition
│   │   └── backend_initial.db # Initial database backup
│   └── modules/
│       ├── RoutingManager.js      # Custom routing module
│       └── PDFValidationManager.js # PDF validation and security
└── frontend/
    ├── Dockerfile             # Frontend container configuration
    ├── default.conf           # Nginx configuration
    ├── views/                 # Handlebars templates
    │   ├── index.hbs          # Main application template
    │   └── layout.hbs         # Base layout template
    ├── partials/              # Reusable template components
    │   ├── books.hbs          # Book listing partial
    │   └── chapters.hbs       # Chapter listing partial
    └── public/                # Static assets and PDF files
        ├── books/             # PDF document storage
        │   └── WaysOfTheWorld-Strayer/  # Sample textbook (git submodule)
        │       ├── README.md          # Book information
        │       ├── .git               # Submodule repository metadata
        │       └── chapter1-23.pdf   # 23 chapter PDF files
        └── styles/
            └── main.css       # Application stylesheet

Setup Instructions

Prerequisites

Before running the application, you'll need to download the PDF books which are stored as git submodules.

  1. Clone the repository and navigate to the project directory:

    git clone <repository-url>
    cd /home/npease/COS498-HW3
    
  2. Initialize and download git submodules (required for PDF books):

    git submodule init
    git submodule update
    

    Or in one command:

    git submodule update --init --recursive
    

Note: The PDF books (e.g., WaysOfTheWorld-Strayer) are stored as git submodules and must be downloaded separately. Without this step, the PDF files will not be available.

How to Start the PDF Management System

  1. Start all services:

    docker compose up --build
    
  2. Access the application:

    The application will automatically display available books and chapters for browsing and PDF viewing.

  3. Stop the services:

    docker compose down
    

Alternative: Manual Docker Build

For development purposes, you can also build and run containers individually:

# Build and run backend
docker build -t hw3-backend ./backend
docker run -d -p 3000:3000 --name backend hw3-backend

# Build and run frontend  
docker build -f frontend/Dockerfile -t hw3-frontend .
docker run -d -p 80:80 --name frontend hw3-frontend

API Endpoints

The backend provides the following endpoints:

Public Endpoints

  • GET / - Homepage with available books
  • GET /book/:bookName - Book details with chapter listing
  • GET /book/:bookName/chapter/:chapterFile - Chapter viewer with PDF display
  • GET /pdf/:bookName/:chapterFile - Direct PDF serving (with validation)

Module Architecture

1. PDF Validation Module (PDFValidationManager.js)

Purpose: Validates all PDF access requests before serving files

Key Features:

  • File existence validation
  • Path security (prevents directory traversal)
  • Input sanitization and validation
  • File extension verification (PDF only)
  • File size limits enforcement
  • Comprehensive error handling with appropriate HTTP status codes

Security Controls:

  • Blocks access outside /frontend/public/books/ directory
  • Prevents path traversal attacks (../)
  • Validates input parameters for dangerous characters
  • Enforces file type restrictions
  • Implements file size limits (100MB default)

2. Custom Routing Module (RoutingManager.js)

Purpose: Handles all page routing and navigation logic

Key Features:

  • Database integration for book/chapter metadata
  • Handlebars template rendering
  • URL construction for PDF links
  • Error handling and 404 responses

Architecture Details

Frontend (Nginx + Handlebars)

  • Serves static files and PDF documents
  • Renders Handlebars templates with dynamic content
  • Proxies API requests to the backend service
  • Configured for PDF MIME type handling

Backend (Node.js/Express)

  • REST API server with PDF management capabilities
  • SQLite database integration for metadata storage
  • Modular architecture with separate validation and routing systems
  • Comprehensive PDF security and validation
  • File serving via Express sendFile with security controls

Database Schema

  • Books Table: id, folder_name, display_name, author
  • Chapters Table: id, chapter_number, display_name, filename, book_id
  • Foreign key relationship between books and chapters
  • Supports multiple books with multiple chapters each

PDF Security System

  • Path Validation: All file paths validated against allowed directories
  • Input Sanitization: Book names and filenames sanitized for security
  • Access Controls: Only PDF files in designated folders accessible
  • Error Handling: Comprehensive error responses for security violations
  • Logging: Security events logged for monitoring

Sample Data

The system includes sample data with the "Ways of the World" textbook by Strayer:

  • Book: WaysOfTheWorld-Strayer
  • Chapters: 23 PDF files (chapter1.pdf through chapter23.pdf)
  • Database: Pre-populated with book and chapter metadata
  • File System: PDF files stored in /frontend/public/books/WaysOfTheWorld-Strayer/

Development Notes

Technical Implementation

  • Modular Architecture: Separate modules for routing, validation, and discovery
  • Security-First Design: Comprehensive validation before serving any PDF
  • Database Integration: SQLite for metadata with file system fallback
  • Error Handling: Detailed error responses with appropriate HTTP status codes
  • File Serving: Secure PDF delivery via Express sendFile

PDF Validation Process

  1. Input Validation: Sanitize book names and filenames
  2. Path Security: Verify paths are within allowed directories
  3. File Extension: Ensure only PDF files are accessed
  4. Existence Check: Validate file exists before serving
  5. Size Validation: Enforce file size limits
  6. Serve File: Deliver PDF via secure sendFile method

Security Considerations

  • Path Traversal Protection: Prevents ../ directory traversal attacks
  • Input Sanitization: Blocks dangerous characters in file paths
  • Directory Restrictions: Only allows access to designated PDF folders
  • File Type Validation: Restricts access to PDF files only
  • Size Limits: Prevents serving oversized files
  • Comprehensive Logging: Security events logged for monitoring

Course Information

Course: COS498 - Server Side Programming Languages
Assignment: Homework 3 - PDF Document Management System
Focus: Demonstrating:

  • Custom routing modules and URL handling
  • PDF validation and security systems
  • Database integration with file operations
  • Modular architecture and separation of concerns

Implementation Requirements Met

2.4.1 Custom Routing Module

  • Custom RoutingManager.js handles all URL routing
  • Database integration for book and chapter metadata
  • Handlebars template rendering with dynamic data
  • URL construction for PDF links and navigation

2.4.3 PDF Validation Module

  • PDFValidationManager.js validates all PDF requests
  • File existence verification before serving
  • Path security and directory restrictions
  • Appropriate error responses (400, 403, 404, 413, 500)
  • Only allows access to PDFs in designated folders

Troubleshooting

Common Issues

  1. PDF not loading: Check that file exists in /frontend/public/books/ directory
  2. 404 errors: Verify book name and chapter filename are correct
  3. 403 access denied: Path validation failed - check for directory traversal attempts
  4. Database errors: Ensure SQLite database exists and is readable
  5. Container issues: Try docker compose down and docker compose up --build

Validation Errors

  • 400 Bad Request: Invalid book name or filename
  • 403 Forbidden: Security violation or path outside allowed directory
  • 404 Not Found: PDF file or book directory doesn't exist
  • 413 Payload Too Large: PDF file exceeds size limit
  • 500 Internal Server Error: Server-side validation error

Development Commands

# View container logs
docker compose logs backend
docker compose logs frontend

# Restart services
docker compose restart

# Rebuild and restart
docker compose down && docker compose up --build

# Check container status
docker compose ps

# Access backend container
docker compose exec backend bash

# Check PDF files
docker compose exec backend ls -la /app/../frontend/public/books/

Author

Nicholas Pease
COS498: Server Side Programming Languages
Homework 3: PDF Document Management System

S
Description
No description provided
Readme 128 KiB
Languages
JavaScript 52.2%
CSS 30.1%
Handlebars 15.3%
Dockerfile 2.4%