Final README

This commit is contained in:
2025-11-26 04:59:25 +00:00
parent 6414f32a5b
commit 8013dec09e
-177
View File
@@ -123,183 +123,6 @@ Before running the application, you'll need to download the PDF books which are
docker compose down
```
### Alternative: Manual Docker Build
For development purposes, you can also build and run containers individually:
```bash
# Build and run backend
docker build -t hw3-backend ./backend
docker run -d -p 3000:3000 --name backend hw3-backend
# Build and run frontend
docker build -f frontend/Dockerfile -t hw3-frontend .
docker run -d -p 80:80 --name frontend hw3-frontend
```
## API Endpoints
The backend provides the following endpoints:
### Public Endpoints
- `GET /` - Homepage with available books
- `GET /book/:bookName` - Book details with chapter listing
- `GET /book/:bookName/chapter/:chapterFile` - Chapter viewer with PDF display
- `GET /pdf/:bookName/:chapterFile` - Direct PDF serving (with validation)
## Module Architecture
### 1. PDF Validation Module (`PDFValidationManager.js`)
**Purpose**: Validates all PDF access requests before serving files
**Key Features**:
- File existence validation
- Path security (prevents directory traversal)
- Input sanitization and validation
- File extension verification (PDF only)
- File size limits enforcement
- Comprehensive error handling with appropriate HTTP status codes
**Security Controls**:
- Blocks access outside `/frontend/public/books/` directory
- Prevents path traversal attacks (`../`)
- Validates input parameters for dangerous characters
- Enforces file type restrictions
- Implements file size limits (100MB default)
### 2. Custom Routing Module (`RoutingManager.js`)
**Purpose**: Handles all page routing and navigation logic
**Key Features**:
- Database integration for book/chapter metadata
- Handlebars template rendering
- URL construction for PDF links
- Error handling and 404 responses
## Architecture Details
### Frontend (Nginx + Handlebars)
- Serves static files and PDF documents
- Renders Handlebars templates with dynamic content
- Proxies API requests to the backend service
- Configured for PDF MIME type handling
### Backend (Node.js/Express)
- REST API server with PDF management capabilities
- SQLite database integration for metadata storage
- Modular architecture with separate validation and routing systems
- Comprehensive PDF security and validation
- File serving via Express sendFile with security controls
### Database Schema
- **Books Table**: `id`, `folder_name`, `display_name`, `author`
- **Chapters Table**: `id`, `chapter_number`, `display_name`, `filename`, `book_id`
- Foreign key relationship between books and chapters
- Supports multiple books with multiple chapters each
### PDF Security System
- **Path Validation**: All file paths validated against allowed directories
- **Input Sanitization**: Book names and filenames sanitized for security
- **Access Controls**: Only PDF files in designated folders accessible
- **Error Handling**: Comprehensive error responses for security violations
- **Logging**: Security events logged for monitoring
## Sample Data
The system includes sample data with the "Ways of the World" textbook by Strayer:
- **Book**: WaysOfTheWorld-Strayer
- **Chapters**: 23 PDF files (chapter1.pdf through chapter23.pdf)
- **Database**: Pre-populated with book and chapter metadata
- **File System**: PDF files stored in `/frontend/public/books/WaysOfTheWorld-Strayer/`
## Development Notes
### Technical Implementation
- **Modular Architecture**: Separate modules for routing, validation, and discovery
- **Security-First Design**: Comprehensive validation before serving any PDF
- **Database Integration**: SQLite for metadata with file system fallback
- **Error Handling**: Detailed error responses with appropriate HTTP status codes
- **File Serving**: Secure PDF delivery via Express sendFile
### PDF Validation Process
1. **Input Validation**: Sanitize book names and filenames
2. **Path Security**: Verify paths are within allowed directories
3. **File Extension**: Ensure only PDF files are accessed
4. **Existence Check**: Validate file exists before serving
5. **Size Validation**: Enforce file size limits
6. **Serve File**: Deliver PDF via secure sendFile method
### Security Considerations
- **Path Traversal Protection**: Prevents `../` directory traversal attacks
- **Input Sanitization**: Blocks dangerous characters in file paths
- **Directory Restrictions**: Only allows access to designated PDF folders
- **File Type Validation**: Restricts access to PDF files only
- **Size Limits**: Prevents serving oversized files
- **Comprehensive Logging**: Security events logged for monitoring
## Course Information
**Course**: COS498 - Server Side Programming Languages
**Assignment**: Homework 3 - PDF Document Management System
**Focus**: Demonstrating:
- Custom routing modules and URL handling
- PDF validation and security systems
- Database integration with file operations
- Modular architecture and separation of concerns
## Implementation Requirements Met
### 2.4.1 Custom Routing Module ✅
- ✅ Custom `RoutingManager.js` handles all URL routing
- ✅ Database integration for book and chapter metadata
- ✅ Handlebars template rendering with dynamic data
- ✅ URL construction for PDF links and navigation
### 2.4.3 PDF Validation Module ✅
- ✅ `PDFValidationManager.js` validates all PDF requests
- ✅ File existence verification before serving
- ✅ Path security and directory restrictions
- ✅ Appropriate error responses (400, 403, 404, 413, 500)
- ✅ Only allows access to PDFs in designated folders
## Troubleshooting
### Common Issues
1. **PDF not loading**: Check that file exists in `/frontend/public/books/` directory
2. **404 errors**: Verify book name and chapter filename are correct
3. **403 access denied**: Path validation failed - check for directory traversal attempts
4. **Database errors**: Ensure SQLite database exists and is readable
5. **Container issues**: Try `docker compose down` and `docker compose up --build`
### Validation Errors
- **400 Bad Request**: Invalid book name or filename
- **403 Forbidden**: Security violation or path outside allowed directory
- **404 Not Found**: PDF file or book directory doesn't exist
- **413 Payload Too Large**: PDF file exceeds size limit
- **500 Internal Server Error**: Server-side validation error
### Development Commands
```bash
# View container logs
docker compose logs backend
docker compose logs frontend
# Restart services
docker compose restart
# Rebuild and restart
docker compose down && docker compose up --build
# Check container status
docker compose ps
# Access backend container
docker compose exec backend bash
# Check PDF files
docker compose exec backend ls -la /app/../frontend/public/books/
```
## Author
Nicholas Pease