COS498 HW3: PDF Document Management System
This is Homework 3 for COS498: Server Side Programming Languages that demonstrates a PDF document management system with custom routing and validation modules.
Project Overview
This project implements a PDF document management system with the following features:
- Frontend: Nginx serving static files and Handlebars templates
- Backend: Node.js/Express server with custom routing and PDF management
- PDF Management: PDF validation and secure serving system
- Security: Path validation and access control for PDF serving
- Database Integration: SQLite database for book and chapter metadata
- Containerization: Docker containers orchestrated with Docker Compose
Features
PDF Document Management
- PDF Validation Module: Comprehensive validation before serving any PDF
- Custom Routing: Dedicated routing module for book and chapter navigation
- Security Controls: Path validation and access restrictions
- Database Integration: SQLite database with books and chapters metadata
PDF Validation System
- File Existence Checks: Validates PDF files exist before serving
- Path Security: Prevents access outside designated directories
- Input Validation: Sanitizes book names and filenames
- Extension Validation: Only allows
.pdffiles - File Size Limits: Enforces maximum file size restrictions
- Error Responses: Appropriate HTTP status codes (400, 403, 404, 413, 500)
User Interface
- Book Browser: Navigate through available books and chapters
- PDF Viewer: Embedded PDF viewing with iframe integration
- Responsive Design: Mobile-friendly layout with consistent styling
- Navigation: Breadcrumb navigation and book/chapter selection
Prerequisites
Before running this project, ensure you have the following installed:
Project Structure
COS498-HW3/
├── docker-compose.yml # Docker Compose configuration
├── README.md # This file
├── .git/ # Git repository metadata
├── .gitmodules # Git submodules configuration
├── backend/
│ ├── Dockerfile # Backend container configuration
│ ├── package.json # Node.js dependencies and scripts
│ ├── package-lock.json # Locked dependency versions
│ ├── server.js # Node.js Express server with PDF management
│ ├── database/
│ │ ├── backend.db # SQLite database with book metadata
│ │ ├── backend.schema # Database schema definition
│ │ └── backend_initial.db # Initial database backup
│ └── modules/
│ ├── RoutingManager.js # Custom routing module
│ └── PDFValidationManager.js # PDF validation and security
└── frontend/
├── Dockerfile # Frontend container configuration
├── default.conf # Nginx configuration
├── views/ # Handlebars templates
│ ├── index.hbs # Main application template
│ └── layout.hbs # Base layout template
├── partials/ # Reusable template components
│ ├── books.hbs # Book listing partial
│ └── chapters.hbs # Chapter listing partial
└── public/ # Static assets and PDF files
├── books/ # PDF document storage
│ └── WaysOfTheWorld-Strayer/ # Sample textbook (git submodule)
│ ├── README.md # Book information
│ ├── .git # Submodule repository metadata
│ └── chapter1-23.pdf # 23 chapter PDF files
└── styles/
└── main.css # Application stylesheet
Setup Instructions
Prerequisites
Before running the application, you'll need to download the PDF books which are stored as git submodules.
-
Clone the repository and navigate to the project directory:
git clone <repository-url> cd /home/npease/COS498-HW3 -
Initialize and download git submodules (required for PDF books):
git submodule init git submodule updateOr in one command:
git submodule update --init --recursive
Note: The PDF books (e.g., WaysOfTheWorld-Strayer) are stored as git submodules and must be downloaded separately. Without this step, the PDF files will not be available.
How to Start the PDF Management System
Using Docker Compose (Recommended)
-
Start all services:
docker compose up --build -
Access the application:
- Main Application: http://localhost
The application will automatically display available books and chapters for browsing and PDF viewing.
-
Stop the services:
docker compose down
Alternative: Manual Docker Build
For development purposes, you can also build and run containers individually:
# Build and run backend
docker build -t hw3-backend ./backend
docker run -d -p 3000:3000 --name backend hw3-backend
# Build and run frontend
docker build -f frontend/Dockerfile -t hw3-frontend .
docker run -d -p 80:80 --name frontend hw3-frontend
API Endpoints
The backend provides the following endpoints:
Public Endpoints
GET /- Homepage with available booksGET /book/:bookName- Book details with chapter listingGET /book/:bookName/chapter/:chapterFile- Chapter viewer with PDF displayGET /pdf/:bookName/:chapterFile- Direct PDF serving (with validation)
Module Architecture
1. PDF Validation Module (PDFValidationManager.js)
Purpose: Validates all PDF access requests before serving files
Key Features:
- File existence validation
- Path security (prevents directory traversal)
- Input sanitization and validation
- File extension verification (PDF only)
- File size limits enforcement
- Comprehensive error handling with appropriate HTTP status codes
Security Controls:
- Blocks access outside
/frontend/public/books/directory - Prevents path traversal attacks (
../) - Validates input parameters for dangerous characters
- Enforces file type restrictions
- Implements file size limits (100MB default)
2. Custom Routing Module (RoutingManager.js)
Purpose: Handles all page routing and navigation logic
Key Features:
- Database integration for book/chapter metadata
- Handlebars template rendering
- URL construction for PDF links
- Error handling and 404 responses
Architecture Details
Frontend (Nginx + Handlebars)
- Serves static files and PDF documents
- Renders Handlebars templates with dynamic content
- Proxies API requests to the backend service
- Configured for PDF MIME type handling
Backend (Node.js/Express)
- REST API server with PDF management capabilities
- SQLite database integration for metadata storage
- Modular architecture with separate validation and routing systems
- Comprehensive PDF security and validation
- File serving via Express sendFile with security controls
Database Schema
- Books Table:
id,folder_name,display_name,author - Chapters Table:
id,chapter_number,display_name,filename,book_id - Foreign key relationship between books and chapters
- Supports multiple books with multiple chapters each
PDF Security System
- Path Validation: All file paths validated against allowed directories
- Input Sanitization: Book names and filenames sanitized for security
- Access Controls: Only PDF files in designated folders accessible
- Error Handling: Comprehensive error responses for security violations
- Logging: Security events logged for monitoring
Sample Data
The system includes sample data with the "Ways of the World" textbook by Strayer:
- Book: WaysOfTheWorld-Strayer
- Chapters: 23 PDF files (chapter1.pdf through chapter23.pdf)
- Database: Pre-populated with book and chapter metadata
- File System: PDF files stored in
/frontend/public/books/WaysOfTheWorld-Strayer/
Development Notes
Technical Implementation
- Modular Architecture: Separate modules for routing, validation, and discovery
- Security-First Design: Comprehensive validation before serving any PDF
- Database Integration: SQLite for metadata with file system fallback
- Error Handling: Detailed error responses with appropriate HTTP status codes
- File Serving: Secure PDF delivery via Express sendFile
PDF Validation Process
- Input Validation: Sanitize book names and filenames
- Path Security: Verify paths are within allowed directories
- File Extension: Ensure only PDF files are accessed
- Existence Check: Validate file exists before serving
- Size Validation: Enforce file size limits
- Serve File: Deliver PDF via secure sendFile method
Security Considerations
- Path Traversal Protection: Prevents
../directory traversal attacks - Input Sanitization: Blocks dangerous characters in file paths
- Directory Restrictions: Only allows access to designated PDF folders
- File Type Validation: Restricts access to PDF files only
- Size Limits: Prevents serving oversized files
- Comprehensive Logging: Security events logged for monitoring
Course Information
Course: COS498 - Server Side Programming Languages
Assignment: Homework 3 - PDF Document Management System
Focus: Demonstrating:
- Custom routing modules and URL handling
- PDF validation and security systems
- Database integration with file operations
- Modular architecture and separation of concerns
Implementation Requirements Met
2.4.1 Custom Routing Module ✅
- ✅ Custom
RoutingManager.jshandles all URL routing - ✅ Database integration for book and chapter metadata
- ✅ Handlebars template rendering with dynamic data
- ✅ URL construction for PDF links and navigation
2.4.3 PDF Validation Module ✅
- ✅
PDFValidationManager.jsvalidates all PDF requests - ✅ File existence verification before serving
- ✅ Path security and directory restrictions
- ✅ Appropriate error responses (400, 403, 404, 413, 500)
- ✅ Only allows access to PDFs in designated folders
Future Enhancements
Planned Improvements
- Enhanced Metadata: Extract PDF metadata (title, author, page count)
- Search Functionality: Full-text search across PDF contents
- User Bookmarks: Save favorite chapters and reading progress
- Access Logs: Detailed logging of PDF access patterns
- Admin Interface: Management tools for adding/removing books
Production Readiness
- HTTPS Support: SSL/TLS certificates for secure PDF delivery
- CDN Integration: Content delivery network for PDF caching
- Database Scaling: PostgreSQL or MongoDB for larger datasets
- Authentication: User accounts and access control systems
- Performance: PDF streaming and progressive loading
Troubleshooting
Common Issues
- PDF not loading: Check that file exists in
/frontend/public/books/directory - 404 errors: Verify book name and chapter filename are correct
- 403 access denied: Path validation failed - check for directory traversal attempts
- Database errors: Ensure SQLite database exists and is readable
- Container issues: Try
docker compose downanddocker compose up --build
Validation Errors
- 400 Bad Request: Invalid book name or filename
- 403 Forbidden: Security violation or path outside allowed directory
- 404 Not Found: PDF file or book directory doesn't exist
- 413 Payload Too Large: PDF file exceeds size limit
- 500 Internal Server Error: Server-side validation error
Development Commands
# View container logs
docker compose logs backend
docker compose logs frontend
# Restart services
docker compose restart
# Rebuild and restart
docker compose down && docker compose up --build
# Check container status
docker compose ps
# Access backend container
docker compose exec backend bash
# Check PDF files
docker compose exec backend ls -la /app/../frontend/public/books/
Author
Nicholas Pease
COS498: Server Side Programming Languages
Homework 3: PDF Document Management System