some recursive jailbreaking with the trotskyist AI

percyraskova · edit-2 3 months ago

some recursive jailbreaking with the trotskyist AI

percyraskova · 3 months ago

https://github.com/percy-raskova/marxists.org-rag-db

It is in progress, if you’d like to help develop this let me know!

percyraskova · 3 months ago

that is precisely what it is: a thin wrapper around the chatgpt API with a RAG database of all their articles

percyraskova · 3 months ago

btw

https://github.com/Kimonarrow/ChatGPT-4o-Jailbreak

knock yourself out! jailbreak their AI and go wild

percyraskova · 3 months ago

i love this idea but i must be honest: fine-tuning a mistral model to specifically embed the weights with the “voice” of Marx Engels Lenin Stalin and Mao would be an immense machine learning and engineering task. Now Abliterated DeepSeek 7B (abliterated means uncensored) = true power…

percyraskova · 3 months ago

thats a tooling/prompting/context window management problem. it can be solved with proper programming procedures and smart memory management

percyraskova · 3 months ago

https://github.com/percy-raskova/marxists.org-rag-db

one step ahead of you

percyraskova · 3 months ago

i’m actually working on this! https://github.com/percy-raskova/pw-mcp

percyraskova · 3 months ago

its literally just them doing the whole 'selling newspapers" thing but with generative AI

percyraskova · 3 months ago

hey! i’m the developer of the Marxist RAG and the ProleWIki RAG so if you have any questions/ideas/suggestions/want to participate reply and let me know!!!

percyraskova · 3 months ago

Yeah I’m actually in the process of fine-turning and developing an actual AI

the trotskyists just use a wrapper around vanilla Chat GPT and give it a database of Trotskyist articles lmfao

vanilla chat gpt is free btw

percyraskova · 3 months ago

excellent! if its ok i can dm you what we had in mind for a program

percyraskova · 3 months ago

anyone live near or around baltimore?

percyraskova · 3 months ago

anyone near baltimore feel free to reach out im trying to start a primary org with my partner and we r drafting up bylaws and points of unity and CONOPS. we have an idea of a survival program but need to flesh it out

percyraskova · 3 months ago

The lethal trifecta for AI agents: private data, untrusted content, and external communication

percyraskova · edit-2 3 months ago

I am fine-tuning a Local Deepseek Model and would like to ask for community help creating interesting questions on any Marxist topic you'd love to ask a local Deepseek model fine-tuned on ProleWiki

percyraskova · 3 months ago

The Vibe Coding Manifesto

percyraskova · 3 months ago

This is generally a good strategy with one caveat about security: docker containers usually run on a docker account that has sudo permissions. Make sure your AI agent doesn’t have the capacity to do any of that especially if you’re mounting your hard drive from within the container itself as like a data drive saving to your local machine!

IIRC Podman is the open source version of Docker and doesn’t have the sudo permission issue described above

Also docker container networking can be a nightmare, but you’re overall correct. I think this is the move!

percyraskova · 3 months ago

For one, you truly own your data. You can control which API endpoints you ping and which models you use. You can use it with Ollama. You aren’t giving over your queries and data to Microsoft.

percyraskova · edit-2 3 months ago

Babylon: A Marxist based Simulation Video Game

percyraskova · 4 months ago

This is a claude code subagent prompt to get an ai to review writings from my PoV

percyraskova · 4 months ago

here is my CLAUDE.md for a project:

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

⚠️ CRITICAL SCALE UPDATE

Corpus Scale: 200GB raw archive → 50GB optimized (75% reduction through strategic filtering)

Corpus Analysis: ✅ Complete - 46GB English content analyzed (55,753 documents across 6 sections)

For architecture overview, see: Architecture Overview (includes corpus foundation)

The architecture includes:

Corpus Foundation: Systematic 46GB analysis informing all data decisions
Metadata Schema: 5-layer model achieving 85%+ author coverage
Chunking Strategies: 4 adaptive strategies based on document structure
Knowledge Graph: ~2,500 entities forming hybrid retrieval foundation
Infrastructure: Simplified GCP architecture with Weaviate + Runpod embeddings
Parallel Development: 6-instance coordination strategy

Project Overview

The Marxists Internet Archive (MIA) RAG Pipeline converts 126,000+ pages of Marxist theory (HTML + PDFs) into a queryable RAG system. This is a local, private, fully-owned knowledge base designed for material analysis research, class composition studies, and theoretical framework development.

Note: The reference implementation below works for small-scale testing. For production processing, see Architecture Overview for complete details.

🏗️ Parallel Development Architecture

This project uses a 6-instance parallel development model where different Claude Code instances work on separate modules simultaneously. Each instance has specific boundaries to prevent conflicts.

Instance Boundaries

Instance 1 (Storage & Pipeline):

src/mia_rag/storage/ - GCS storage management
src/mia_rag/pipeline/ - Document processing pipeline
tests/unit/instance1_* - Instance 1 tests

Instance 2 (Embeddings):

src/mia_rag/embeddings/ - Runpod embedding generation
tests/unit/instance2_* - Instance 2 tests

Instance 3 (Weaviate):

src/mia_rag/vectordb/ - Weaviate vector database
tests/unit/instance3_* - Instance 3 tests

Instance 4 (API):

src/mia_rag/api/ - FastAPI query interface
tests/unit/instance4_* - Instance 4 tests

Instance 5 (MCP):

src/mia_rag/mcp/ - Model Context Protocol integration
tests/unit/instance5_* - Instance 5 tests

Instance 6 (Monitoring & Testing):

src/mia_rag/monitoring/ - Prometheus/Grafana monitoring
tests/(integration|scale|contract)/ - Cross-instance tests

Shared Resources (require coordination):

src/mia_rag/interfaces/ - Interface contracts (RFC process required)
src/mia_rag/common/ - Shared utilities

Working in Parallel

Before starting work:

Check planning/ directory for active projects and issues
Verify your instance assignment in .instance file
Run boundary check: poetry run python scripts/check_boundaries.py --instance instance{N} --auto

Branch naming convention:

Instance work: instance{N}/{module}-{feature} (e.g., instance1/storage-gcs-retry)
Interface changes: rfc/{number}-{description} (e.g., rfc/001-metadata-schema)
Releases: release/v{version} (e.g., release/v0.2.0)
Hotfixes: hotfix/{description} (e.g., hotfix/memory-leak)

CI/CD workflows:

instance-tests.yml - Runs tests for changed instances only
conflict-detection.yml - Detects boundary violations in PRs
daily-integration.yml - Merges instance branches into shared integration branch

Development Commands

Setup and Installation

# Install Poetry dependencies (core + dev)
poetry install

# Install specific instance dependencies
poetry install --extras instance1  # Storage & Pipeline
poetry install --extras instance2  # Embeddings
poetry install --extras instance3  # Weaviate
poetry install --extras instance4  # API
poetry install --extras instance5  # MCP
poetry install --extras instance6  # Monitoring

# Install all dependencies (integration testing)
poetry install --extras all

Testing

# Run all tests for your instance
poetry run pytest -m instance1  # Replace with your instance number

# Run specific test types
poetry run pytest -m unit        # Unit tests only
poetry run pytest -m integration # Integration tests
poetry run pytest -m contract    # Contract tests (interface validation)

# Run tests for a specific file
poetry run pytest tests/unit/instance1_storage/test_gcs_storage.py

# Run with coverage
poetry run pytest --cov=src/mia_rag --cov-report=html

# Run specific test by name
poetry run pytest -k "test_embedding_generation"

Linting and Code Quality

# Run Ruff linting
poetry run ruff check .

# Auto-fix issues
poetry run ruff check --fix .

# Format code
poetry run ruff format .

# Type checking
poetry run mypy src/

# Check cyclomatic complexity (for refactoring)
poetry run radon cc src/ -a -nb

Git Workflow

# Install git hooks
bash scripts/install-hooks.sh

# Check boundaries before commit
poetry run python scripts/check_boundaries.py --instance instance1 --auto

# Check interface compliance
poetry run python scripts/check_interfaces.py --check-all

# Commit with conventional commit format
git commit -m "feat(storage): add GCS retry logic"
# Types: feat, fix, docs, style, refactor, test, chore

Running the Pipeline (Reference Implementation)

# Step 1: Download MIA metadata
python mia_processor.py --download-json

# Step 2: Process archive (HTML/PDF → Markdown)
python mia_processor.py --process-archive ~/Downloads/dump_www-marxists-org/ --output ~/marxists-processed/

# Step 3: Ingest to vector database
python rag_ingest.py --db chroma --markdown-dir ~/marxists-processed/markdown/ --persist-dir ./mia_vectordb/

# Step 4: Query the system
python query_example.py --db chroma --query "What is surplus value?" --persist-dir ./mia_vectordb/

Code Architecture

Reference Implementation (Legacy)

The original monolithic implementation consists of:

mia_processor.py - HTML/PDF to Markdown conversion
rag_ingest.py - Chunking and vector database ingestion
query_example.py - Query interface

These are working but being refactored into the modular src/mia_rag/ structure.

Refactored Architecture

Domain Models (scripts/domain/):

boundaries.py - Instance boundary specifications
instance.py - Instance configuration and metadata
interfaces.py - Interface contract definitions
recovery.py - Recovery state and operations
metrics.py - Metrics and performance tracking

Design Patterns (scripts/patterns/):

specifications.py - Specification pattern for boundary checking
validators.py - Chain of Responsibility pattern for validation
visitors.py - Visitor pattern for interface analysis
commands.py - Command pattern for operations
recovery.py - Template Method pattern for recovery strategies
repositories.py - Repository pattern for data access
builders.py - Builder pattern for complex objects

Key Refactored Scripts:

scripts/check_boundaries.py - Uses Specification pattern (✅ Refactored)
scripts/check_conflicts.py - Uses Chain of Responsibility (✅ Refactored)
scripts/check_interfaces.py - Uses Visitor pattern (✅ Refactored)
scripts/instance_map.py - Uses Command pattern (✅ Refactored)
scripts/instance_recovery.py - Uses Template Method pattern (✅ Refactored)

Complexity Targets (enforced by Ruff):

Max branches: 12 per function
Max statements: 50 per function
Max arguments: 7 per function
Max returns: 6 per function

Package Structure

src/mia_rag/
├── interfaces/          # Interface contracts (shared)
│   ├── __init__.py
│   └── contracts.py
├── common/              # Shared utilities (coordination required)
├── storage/             # Instance 1: GCS storage
├── pipeline/            # Instance 1: Document processing
├── embeddings/          # Instance 2: Runpod embeddings
├── vectordb/            # Instance 3: Weaviate
├── api/                 # Instance 4: FastAPI
├── mcp/                 # Instance 5: MCP server
└── monitoring/          # Instance 6: Prometheus/Grafana

Corpus Analysis Foundation

CRITICAL: All implementation decisions must be informed by the completed corpus analysis (46GB English content, 55,753 documents).

Essential Reading Before Coding

Metadata & Schemas:

…/docs/explanation/corpus-analysis/06-metadata-unified-schema.md - 5-layer metadata model
- Achieves 85%+ author coverage through multi-source extraction
- Section-specific rules: Archive (100% path), ETOL (85% title+keywords), EROL (95% org from title)
- Encoding normalization: 62% ISO-8859-1 → UTF-8 conversion required

Chunking & Document Structure:

…/specs/07-chunking-strategies-spec.md - 4 adaptive chunking strategies
- 70% documents have good heading hierarchies → semantic-break chunking
- 40% heading-less → paragraph-cluster chunking fallback
- Glossary → entry-based chunking (special case)
- Target: 650-750 tokens/chunk average, >70% with heading context

Knowledge Graph & Entities:

…/specs/08-knowledge-graph-spec.md - Hybrid retrieval architecture
- ~2,500 Glossary entities form canonical node set
- 10 node types, 14 edge types for vector + graph retrieval
- 5k-10k cross-references extracted from corpus

Section-Specific Analyses

When implementing processing for specific corpus sections, consult:

Archive (4.3GB, 15,637 files): …/docs/explanation/corpus-analysis/01-archive-section-analysis.md
History (33GB, 33,190 files - ETOL/EROL/Other): […/docs/

percyraskova · 4 months ago

Any other comrades use other Agentic AI interfaces?

percyraskova · 4 months ago

You can actually implement a lot of this using static analysis and linting tools to guarantee a consistent output. You have the right idea but for something like linting and formatting IMO it is better to have a static analysis tool that gives you the same result every single time. Turn it into a pre commit hook to ensure consistency of code and taht your LLM agent stays on track. But you got the right idea.

Great for Python projects