optimized results and new benchmark

adjustment to triton
adjusting the script with new paths
2026-05-29 00:42:40 +02:00 · 2026-05-19 10:21:24 +02:00 · 2026-05-19 10:13:29 +02:00 · 2026-05-19 10:03:52 +02:00 · 2026-05-19 09:23:31 +02:00 · 2026-05-16 16:50:33 +02:00
15 changed files with 18141 additions and 5261 deletions
@@ -1,20 +1,88 @@
-# 1. Broad Ignores
+# =========================
-/Data/*
+# Python
-/attach/*
+# =========================
 /results/*
 /enarcelona/*
 .env
 __pycache__/
-*.pyc
+*.py[cod]
-*.csv
+*$py.class
-=======
+.ipynb_checkpoints/
 /reference/
 *.svg
 >>>>>>> Stashed changes
 # 2. Ignore virtual environments COMPLETELY
 # This must come BEFORE the unignore rule
 env*/
-# 3. The "Unignore" rule (Whitelisting)
+# =========================
-# We only unignore .py files that aren't already blocked by the rules above
+# Virtual environments
-!**/*.py
+# =========================
 env/
 env*/
 venv/
 .venv/
 enarcelona/
 # =========================
 # Secrets
 # =========================
 .env
 *.env
 # =========================
 # Patient data / sensitive data
 # =========================
 Data/
 data/raw/
 data/processed/
 data/ground_truth/
 reference/
 # =========================
 # Generated results and logs
 # =========================
 results/
 results_edss_benchmark/
 *.log
 # =========================
 # Large/generated file types
 # =========================
 *.csv
 *.tsv
 *.json
 *.jsonl
 *.xlsx
 *.xls
 *.png
 *.PNG
 *.jpg
 *.jpeg
 *.svg
 *.pdf
 # =========================
 # Temporary / backup files
 # =========================
 *.tmp
 *.bak
 *.orig
 .DS_Store
 # =========================
 # Keep important code/config/docs
 # =========================
 !README.md
 !requirements.txt
 !*.py
 !*.md
 !*.yml
 !*.yaml
 !*.toml
 # Keep prompt templates / schemas if safe to publish
 !prompts/
 !prompts/**
 !attach/
 !attach/*.gbnf
 !attach/just_edss_text.txt
 !attach/Komplett.txt
 # Keep example/synthetic data only
 !data/
 !data/example/
 !data/example/**
 !Data/example/
 !Data/example/**
@@ -0,0 +1,31 @@
 # Project Structure
 This project was reorganized into:
 - `data/`
  - `raw/`: original raw data, if retained locally
  - `processed/`: cleaned or derived input data
  - `ground_truth/`: manually annotated reference data
  - `external/`: externally provided data
 - `prompts/`
  - EDSS instructions and prompt/schema assets
 - `scripts/`
  - runnable analysis and plotting scripts
 - `results/`
  - `benchmark_runs/`: full model benchmark runs
  - `final_results/`: final selected model outputs
  - `figures/`: generated figures
  - `tables/`: generated tables
  - `logs/`: terminal logs
 - `manuscript/`
  - final figures and tables for paper/thesis writing
 - `archive/`
  - old scripts, old results, temporary files, and unclear legacy files
 Important:
 The reorganization was performed after creating a full timestamped backup.
@@ -216,6 +216,3 @@ if __name__ == "__main__":
 # %% name
 eXXXXXXXX
 ##
@@ -1,600 +0,0 @@
 # %% API call1
 #import time
 #import json
 #import os
 #from datetime import datetime
 #import pandas as pd
 #from openai import OpenAI
 #from dotenv import load_dotenv
 #
 ## Load environment variables
 #load_dotenv()
 #
 ## === CONFIGURATION ===
 #OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 #OPENAI_BASE_URL = os.getenv("OPENAI_BASE_URL")
 #MODEL_NAME = "GPT-OSS-120B"
 #HEALTH_URL = f"{OPENAI_BASE_URL}/health"  # Placeholder - actual health check would need to be implemented
 #CHAT_URL = f"{OPENAI_BASE_URL}/chat/completions"
 #
 ## File paths
 #INPUT_CSV = "/home/shahin/Lab/Doktorarbeit/Barcelona/Data/MS_Briefe_400_with_unique_id_SHA3_explore_cleaned_unique.csv"
 #EDSS_INSTRUCTIONS_PATH = "/home/shahin/Lab/Doktorarbeit/Barcelona/attach/Komplett.txt"
 ##GRAMMAR_FILE = "/home/shahin/Lab/Doktorarbeit/Barcelona/attach/just_edss_schema.gbnf"
 #
 ## Initialize OpenAI client
 #client = OpenAI(
 #    api_key=OPENAI_API_KEY,
 #    base_url=OPENAI_BASE_URL
 #)
 #
 ## Read EDSS instructions from file
 #with open(EDSS_INSTRUCTIONS_PATH, 'r') as f:
 #    EDSS_INSTRUCTIONS = f.read().strip()
 ## === RUN INFERENCE 2 ===
 #def run_inference(patient_text):
 #    prompt = f'''
 #    Du bist ein medizinischer Assistent, der spezialisiert darauf ist, EDSS-Scores (Expanded Disability Status Scale) aus klinischen Berichten zu extrahieren.
 #### Regeln für die Ausgabe:
 #1. **Reason**: Erstelle eine prägnante Zusammenfassung (max. 400 Zeichen) der Befunde auf **DEUTSCH**, die zur Einstufung führen.
 #2. **klassifizierbar**:
 #   - Setze dies auf **true**, wenn ein EDSS-Wert identifiziert, berechnet oder basierend auf den klinischen Hinweisen plausibel geschätzt werden kann.
 #   - Setze dies auf **false**, NUR wenn die Daten absolut unzureichend oder so widersprüchlich sind, dass keinerlei Einstufung möglich ist.
 #3. **EDSS**:
 #   - Dieses Feld ist **VERPFLICHTEND**, wenn "klassifizierbar" auf true steht.
 #   - Es muss eine Zahl zwischen 0.0 und 10.0 sein.
 #   - Versuche stets, den EDSS-Wert so präzise wie möglich zu bestimmen, auch wenn die Datenlage dünn ist (nutze verfügbare Informationen zu Gehstrecke und Funktionssystemen).
 #   - Dieses Feld **DARF NICHT ERSCHEINEN**, wenn "klassifizierbar" auf false steht.
 #
 #### Einschränkungen:
 #- Erfinde keine Fakten, aber nutze klinische Herleitungen aus dem Bericht, um den EDSS zu bestimmen.
 #- Priorisiere die Vergabe eines EDSS-Wertes gegenüber der Markierung als nicht klassifizierbar.
 #- Halte dich strikt an die JSON-Struktur.
 #
 #EDSS-Bewertungsrichtlinien:
 #{EDSS_INSTRUCTIONS}
 #
 #Patientenbericht:
 #{patient_text}
 #'''
 #    start_time = time.time()
 #
 #    try:
 #        # Make API call using OpenAI client
 #        response = client.chat.completions.create(
 #            messages=[
 #                {
 #                    "role": "system",
 #                    "content": "You extract EDSS scores. You prioritize providing a score even if data is partial, by using clinical inference."
 #                },
 #                {
 #                    "role": "user",
 #                    "content": prompt
 #                }
 #            ],
 #            model=MODEL_NAME,
 #            max_tokens=2048,
 #            temperature=0.0,
 #            response_format={"type": "json_object"}
 #        )
 #
 #        # Extract content from response
 #        content = response.choices[0].message.content
 #
 #        # Parse the JSON response
 #        parsed = json.loads(content)
 #
 #        inference_time = time.time() - start_time
 #
 #        return {
 #            "success": True,
 #            "result": parsed,
 #            "inference_time_sec": inference_time
 #        }
 #
 #    except Exception as e:
 #        print(f"Inference error: {e}")
 #        return {
 #            "success": False,
 #            "error": str(e),
 #            "inference_time_sec": -1
 #        }
 ## === BUILD PATIENT TEXT ===
 #def build_patient_text(row):
 #    return (
 #        str(row["T_Zusammenfassung"]) + "\n" +
 #        str(row["Diagnosen"]) + "\n" +
 #        str(row["T_KlinBef"]) + "\n" +
 #        str(row["T_Befunde"]) + "\n"
 #    )
 #
 #if __name__ == "__main__":
 #    # Read CSV file ONLY inside main block
 #    df = pd.read_csv(INPUT_CSV, sep=';')
 #    results = []
 #
 #    # Process each row
 #    for idx, row in df.iterrows():
 #        print(f"Processing row {idx + 1}/{len(df)}")
 #        try:
 #            patient_text = build_patient_text(row)
 #            result = run_inference(patient_text)
 #
 #            # Add unique_id and MedDatum to result for tracking
 #            result["unique_id"] = row.get("unique_id", f"row_{idx}")
 #            result["MedDatum"] = row.get("MedDatum", None)
 #
 #            results.append(result)
 #            print(json.dumps(result, indent=2))
 #        except Exception as e:
 #            print(f"Error processing row {idx}: {e}")
 #            results.append({
 #                "success": False,
 #                "error": str(e),
 #                "unique_id": row.get("unique_id", f"row_{idx}"),
 #                "MedDatum": row.get("MedDatum", None)
 #            })
 #
 #    # Save results to a JSON file
 #    output_json = INPUT_CSV.replace(".csv", "_results_Nisch.json")
 #    with open(output_json, 'w') as f:
 #        json.dump(results, f, indent=2)
 #    print(f"Results saved to {output_json}")
 ##
 # %% API call1 - Enhanced with certainty scoring
 #import time
 #import json
 #import os
 #from datetime import datetime
 #import pandas as pd
 #from openai import OpenAI
 #from dotenv import load_dotenv
 #
 ## Load environment variables
 #load_dotenv()
 #
 ## === CONFIGURATION ===
 #OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 #OPENAI_BASE_URL = os.getenv("OPENAI_BASE_URL")
 #MODEL_NAME = "GPT-OSS-120B"
 #
 ## File paths
 #INPUT_CSV = "/home/shahin/Lab/Doktorarbeit/Barcelona/Data/Test.csv"
 #EDSS_INSTRUCTIONS_PATH = "/home/shahin/Lab/Doktorarbeit/Barcelona/attach/Komplett.txt"
 #
 ## Initialize OpenAI client
 #client = OpenAI(
 #    api_key=OPENAI_API_KEY,
 #    base_url=OPENAI_BASE_URL
 #)
 #
 ## Read EDSS instructions from file
 #with open(EDSS_INSTRUCTIONS_PATH, 'r') as f:
 #    EDSS_INSTRUCTIONS = f.read().strip()
 #
 ## === PROMPT WITH CERTAINTY REQUEST ===
 #def build_prompt(patient_text):
 #    return f'''Du bist ein medizinischer Assistent, der spezialisiert darauf ist, EDSS-Scores (Expanded Disability Status Scale), alle Unterkategorien und die Bewertungssicherheit aus klinischen Berichten zu extrahieren.
 #
 #### Deine Aufgabe:
 #1. Analysiere den Patientenbericht und extrahiere:
 #   - Den Gesamt-EDSS-Score (0.0–10.0)
 #   - Alle 8 EDSS-Unterkategorien (mit jeweils eigener Maximalpunktzahl)
 #2. Schätze für jede Entscheidung die Sicherheit als Ganzzahl von 0–100 % ein.
 #
 #### Struktur der JSON-Ausgabe (VERPFLICHTEND):
 #Gib NUR gültiges JSON zurück — kein Markdown, kein Text davor/dahinter.
 #
 #{{
 #  "reason": "Kernaussage zur EDSS-Begründung (max. 400 Zeichen, auf Deutsch).",
 #  "klassifizierbar": true/false,
 #  "EDSS": null ODER Zahl zwischen 0.0 und 10.0 (nur wenn klassifizierbar=true)",
 #  "certainty_percent": 0 ODER Zahl zwischen 0 und 100 (Ganzzahl)",
 #  "subcategories": {{
 #    "VISUAL_OPTIC_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "BRAINSTEM_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "PYRAMIDAL_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "CEREBELLAR_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "SENSORY_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "BOWEL_AND_BLADDER_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "CEREBRAL_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
 #    "AMBULATION": null ODER Zahl zwischen 0.0 und 10.0
 #  }}
 #}}
 #
 #### Regeln:
 #- **reason**: Kurze, prägnante Begründung (auf Deutsch, max. 400 Zeichen), warum du den EDSS-Wert und die Unterkategorien so bewertest.
 #- **klassifizierbar**:
 #  - `true`, wenn EDSS und mindestens die wichtigsten Unterkategorien *eindeutig ableitbar* oder *plausibel inferierbar* sind.
 #  - `false`, **nur**, wenn keine relevanten Daten vorliegen, oder diese so widersprüchlich/inkonsistent sind, dass keine vernünftige Einschätzung möglich ist.
 #- **EDSS**:
 #  - **VERPFLICHTEND**, wenn `klassifizierbar=true`.
 #  - Zahl zwischen 0.0 und 10.0 (z.B. 3.0, 5.5). Darf **nicht** erscheinen, wenn `klassifizierbar=false`.
 #- **certainty_percent**:
 #  - **Immer present** — Ganzzahl (0–100), basierend auf:
 #    - Klarheit und Vollständigkeit der Berichtsangaben,
 #    - Stichhaltigkeit der Schlussfolgerung (inkl. Inferenz),
 #    - Konsistenz zwischen den Unterkategorien.
 #- **subcategories**:
 #  - **Immer present** — **alle 8 Unterkategorien** müssen enthalten sein.
 #  - Jeder Wert ist entweder:
 #    - `null` (wenn keine ausreichende Information vorliegt), **oder**
 #    - eine Zahl ≤ jeweiliger Obergrenze (z.B. Ambulation ≤ 10.0).
 #  - Wenn die Unterkategorie plausibel inferiert werden kann (auch indirekt), gib einen sinnvollen Wert ab.
 #  - Beispiel: Wenn „Gang mit Krückstock auf ebenem Boden bis 200 m“ steht, setze `AMBULATION: 5.5`.
 #
 #### EDSS-Bewertungsrichtlinien:
 #{EDSS_INSTRUCTIONS}
 #
 #Patientenbericht:
 #{patient_text}
 #'''
 #
 ## === INFERENCE FUNCTION ===
 #def run_inference(patient_text):
 #    prompt = build_prompt(patient_text)
 #
 #    start_time = time.time()
 #
 #    try:
 #        response = client.chat.completions.create(
 #            messages=[
 #                {"role": "system", "content": "Du gibst EXKLUSIV gültiges JSON zurück — keine weiteren Erklärungen."}
 #            ] + [
 #                {"role": "user", "content": prompt}
 #            ],
 #            model=MODEL_NAME,
 #            max_tokens=2048,
 #            temperature=0.1,  # Slightly higher for more natural certainty estimation (still low for reliability)
 #            response_format={"type": "json_object"}
 #        )
 #
 #        content = response.choices[0].message.content
 #
 #        # Parse and validate JSON
 #        try:
 #            parsed = json.loads(content)
 #        except json.JSONDecodeError as e:
 #            print(f"⚠️  JSON parsing failed: {e}")
 #            print("Raw response:", content[:500])
 #            raise ValueError("Model did not return valid JSON")
 #
 #        # Enforce required keys
 #        if "certainty_percent" not in parsed:
 #            print("⚠️  Missing 'certainty_percent' in output! Force-adding fallback.")
 #            parsed["certainty_percent"] = 0  # fallback
 #        elif not isinstance(parsed["certainty_percent"], (int, float)):
 #            parsed["certainty_percent"] = int(parsed["certainty_percent"])
 #
 #        # Clamp certainty to [0, 100]
 #        pct = parsed["certainty_percent"]
 #        parsed["certainty_percent"] =max(0, min(100, int(pct)))
 #
 #        # Enforce EDSS rules: if not classifiable → remove EDSS
 #        if not parsed.get("klassifizierbar", False):
 #            if "EDSS" in parsed:
 #                del parsed["EDSS"]  # per spec, must not appear if not classifiable
 #        else:
 #            if "EDSS" not in parsed:
 #                print("⚠️  'klassifizierbar' is true but EDSS missing — adding fallback.")
 #                parsed["EDSS"] = 7.0  # last-resort fallback
 #
 #        inference_time = time.time() - start_time
 #
 #        return {
 #            "success": True,
 #            "result": parsed,
 #            "inference_time_sec": inference_time
 #        }
 #
 #    except Exception as e:
 #        print(f"❌ Inference error: {e}")
 #        return {
 #            "success": False,
 #            "error": str(e),
 #            "inference_time_sec": -1,
 #            "result": None  # no structured output
 #        }
 #
 ## === BUILD PATIENT TEXT ===
 #def build_patient_text(row):
 #    return (
 #        str(row.get("T_Zusammenfassung", "")) + "\n" +
 #        str(row.get("Diagnosen", "")) + "\n" +
 #        str(row.get("T_KlinBef", "")) + "\n" +
 #        str(row.get("T_Befunde", ""))
 #    )
 #
 #if __name__ == "__main__":
 #    # Load data
 #    df = pd.read_csv(INPUT_CSV, sep=';')
 #    results = []
 #
 #    # Optional: limit for testing
 #    # df = df.head(3)
 #
 #    print(f"Processing {len(df)} rows...")
 #    for idx, row in df.iterrows():
 #        print(f"\n— Row {idx + 1}/{len(df)} —")
 #        try:
 #            patient_text = build_patient_text(row)
 #            result = run_inference(patient_text)
 #
 #            # Attach metadata
 #            result["unique_id"] = row.get("unique_id", f"row_{idx}")
 #            result["MedDatum"] = row.get("MedDatum", None)
 #
 #            results.append(result)
 #
 #            # Print summary
 #            if result["success"]:
 #                res = result["result"]
 #                edss = res.get("EDSS", "N/A") if res.get("klassifizierbar") else "N/A"
 #                print(f"✅ Result → EDSS={edss}, certainty={res.get('certainty_percent', 'N/A')}%")
 #                print(f"   Reason: {res.get('reason', 'N/A')[:100]}…")
 #            else:
 #                print(f"❌ Failed: {result.get('error', 'Unknown error')[:100]}")
 #
 #        except Exception as e:
 #            print(f"⚠️  Error processing row {idx}: {e}")
 #            results.append({
 #                "success": False,
 #                "error": str(e),
 #                "unique_id": row.get("unique_id", f"row_{idx}"),
 #                "MedDatum": row.get("MedDatum", None),
 #                "result": None
 #            })
 #
 #    # Save results
 #    output_json = INPUT_CSV.replace(".csv", "_results_Nisch_certainty.json")
 #    with open(output_json, 'w', encoding='utf-8') as f:
 #        json.dump(results, f, indent=2, ensure_ascii=False)
 #    print(f"\n✅ Saved results to: {output_json}")
 #
 ##
 # %% API call - Multi-iteration EDSS + certainty extraction
 import time
 import json
 import os
 from datetime import datetime
 import pandas as pd
 from openai import OpenAI
 from dotenv import load_dotenv
 # Load environment variables
 load_dotenv()
 # === CONFIGURATION ===
 OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 OPENAI_BASE_URL = os.getenv("OPENAI_BASE_URL")
 MODEL_NAME = "GPT-OSS-120B"
 # File paths
 INPUT_CSV = "/home/shahin/Lab/Doktorarbeit/Barcelona/Data/MS_Briefe_400_with_unique_id_SHA3_explore_cleaned_unique.csv"
 EDSS_INSTRUCTIONS_PATH = "/home/shahin/Lab/Doktorarbeit/Barcelona/attach/Komplett.txt"
 # Iteration settings
 NUM_ITERATIONS = 20
 STOP_ON_FIRST_ERROR = False  # Set to True for debugging
 # Initialize OpenAI client
 client = OpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_BASE_URL
 )
 # Read EDSS instructions from file
 with open(EDSS_INSTRUCTIONS_PATH, 'r') as f:
    EDSS_INSTRUCTIONS = f.read().strip()
 # === PROMPT (unchanged from before) ===
 def build_prompt(patient_text):
    return f'''Du bist ein medizinischer Assistent, der spezialisiert darauf ist, EDSS-Scores (Expanded Disability Status Scale), alle Unterkategorien und die Bewertungssicherheit aus klinischen Berichten zu extrahieren.
 ### Deine Aufgabe:
 1. Analysiere den Patientenbericht und extrahiere:
   - Den Gesamt-EDSS-Score (0.0–10.0)
   - Alle 8 EDSS-Unterkategorien (mit jeweils eigener Maximalpunktzahl)
 2. Schätze für jede Entscheidung die Sicherheit als Ganzzahl von 0–100 % ein.
 ### Struktur der JSON-Ausgabe (VERPFLICHTEND):
 Gib NUR gültiges JSON zurück — kein Markdown, kein Text davor/dahinter.
 {{
  "reason": "Kernaussage zur EDSS-Begründung (max. 400 Zeichen, auf Deutsch).",
  "klassifizierbar": true/false,
  "EDSS": null ODER Zahl zwischen 0.0 und 10.0 (nur wenn klassifizierbar=true)",
  "certainty_percent": 0 ODER Zahl zwischen 0 und 100 (Ganzzahl)",
  "subcategories": {{
    "VISUAL_OPTIC_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "BRAINSTEM_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "PYRAMIDAL_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "CEREBELLAR_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "SENSORY_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "BOWEL_AND_BLADDER_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "CEREBRAL_FUNCTIONS": null ODER Zahl zwischen 0.0 und 6.0,
    "AMBULATION": null ODER Zahl zwischen 0.0 und 10.0
  }}
 }}
 ### Regeln:
 - **reason**: Kurze, prägnante Begründung (auf Deutsch, max. 400 Zeichen), warum du den EDSS-Wert und die Unterkategorien so bewertest.
 - **klassifizierbar**:
  - `true`, wenn EDSS und mindestens die wichtigsten Unterkategorien *eindeutig ableitbar* oder *plausibel inferierbar* sind.
  - `false`, **nur**, wenn keine relevanten Daten vorliegen, oder diese so widersprüchlich/inkonsistent sind, dass keine vernünftige Einschätzung möglich ist.
 - **EDSS**:
  - **VERPFLICHTEND**, wenn `klassifizierbar=true`.
  - Zahl zwischen 0.0 und 10.0 (z.B. 3.0, 5.5). Darf **nicht** erscheinen, wenn `klassifizierbar=false`.
 - **certainty_percent**:
  - **Immer present** — Ganzzahl (0–100), basierend auf:
    - Klarheit und Vollständigkeit der Berichtsangaben,
    - Stichhaltigkeit der Schlussfolgerung (inkl. Inferenz),
    - Konsistenz zwischen den Unterkategorien.
 - **subcategories**:
  - **Immer present** — **alle 8 Unterkategorien** müssen enthalten sein.
  - Jeder Wert ist entweder:
    - `null` (wenn keine ausreichende Information vorliegt), **oder**
    - eine Zahl ≤ jeweiliger Obergrenze (z.B. Ambulation ≤ 10.0).
  - Wenn die Unterkategorie plausibel inferiert werden kann (auch indirekt), gib einen sinnvollen Wert ab.
  - Beispiel: Wenn „Gang mit Krückstock auf ebenem Boden bis 200 m“ steht, setze `AMBULATION: 5.5`.
 ### EDSS-Bewertungsrichtlinien:
 {EDSS_INSTRUCTIONS}
 Patientenbericht:
 {patient_text}
 '''
 # === INFERENCE FUNCTION (unchanged) ===
 def run_inference(patient_text):
    prompt = build_prompt(patient_text)
    start_time = time.time()
    try:
        response = client.chat.completions.create(
            messages=[
                {"role": "system", "content": "Du gibst EXKLUSIV gültiges JSON zurück — keine weiteren Erklärungen."}
            ] + [
                {"role": "user", "content": prompt}
            ],
            model=MODEL_NAME,
            max_tokens=2048,
            temperature=0.1,
            response_format={"type": "json_object"}
        )
        content = response.choices[0].message.content
        # Parse and validate JSON
        try:
            parsed = json.loads(content)
        except json.JSONDecodeError as e:
            print(f"⚠️   JSON parsing failed: {e}")
            print("Raw response:", content[:500])
            raise ValueError("Model did not return valid JSON")
        # Enforce required keys
        if "certainty_percent" not in parsed:
            print("⚠️   Missing 'certainty_percent' in output! Force-adding fallback.")
            parsed["certainty_percent"] = 0
        elif not isinstance(parsed["certainty_percent"], (int, float)):
            parsed["certainty_percent"] = int(parsed["certainty_percent"])
        # Clamp certainty to [0, 100]
        pct = parsed["certainty_percent"]
        parsed["certainty_percent"] = max(0, min(100, int(pct)))
        # Enforce EDSS rules
        if not parsed.get("klassifizierbar", False):
            if "EDSS" in parsed:
                del parsed["EDSS"]
        else:
            if "EDSS" not in parsed:
                print("⚠️   'klassifizierbar' is true but EDSS missing — adding fallback.")
                parsed["EDSS"] = 7.0
        inference_time = time.time() - start_time
        return {
            "success": True,
            "result": parsed,
            "inference_time_sec": inference_time
        }
    except Exception as e:
        print(f"❌ Inference error: {e}")
        return {
            "success": False,
            "error": str(e),
            "inference_time_sec": -1,
            "result": None
        }
 # === BUILD PATIENT TEXT ===
 def build_patient_text(row):
    return (
        str(row.get("T_Zusammenfassung", "")) + "\n" +
        str(row.get("Diagnosen", "")) + "\n" +
        str(row.get("T_KlinBef", "")) + "\n" +
        str(row.get("T_Befunde", ""))
    )
 # === MAIN LOOP (NEW: MULTI-ITERATION) ===
 if __name__ == "__main__":
    # Load data ONCE (to avoid repeated I/O overhead)
    df = pd.read_csv(INPUT_CSV, sep=';')
    total_rows = len(df)
    print(f"Loaded {total_rows} patient records.")
    for iteration in range(1, NUM_ITERATIONS + 1):
        print(f"\n{'='*60}")
        print(f"🔄 ITERATION {iteration}/{NUM_ITERATIONS}")
        print(f"{'='*60}")
        iteration_results = []
        start_iter = time.time()
        for idx, row in df.iterrows():
            print(f"\rRow {idx+1}/{total_rows} | Iter {iteration}", end='', flush=True)
            try:
                patient_text = build_patient_text(row)
                result = run_inference(patient_text)
                # Attach metadata
                if result["success"]:
                    res = result["result"].copy()  # avoid mutation
                    res["iteration"] = iteration
                    res["unique_id"] = row.get("unique_id", f"row_{idx}")
                    res["MedDatum"] = row.get("MedDatum", None)
                    result["result"] = res
                else:
                    result["iteration"] = iteration
                    result["unique_id"] = row.get("unique_id", f"row_{idx}")
                    result["MedDatum"] = row.get("MedDatum", None)
                iteration_results.append(result)
                if result["success"]:
                    res = result["result"]
                    edss = res.get("EDSS", "N/A") if res.get("klassifizierbar") else "N/A"
                    print(f" ✅ EDSS={edss}, cert={res.get('certainty_percent', '?')}%")
                else:
                    print(f" ❌ {result.get('error', 'Unknown')}")
            except Exception as e:
                print(f"\n⚠️   Row {idx} failed: {e}")
                iteration_results.append({
                    "success": False,
                    "error": str(e),
                    "iteration": iteration,
                    "unique_id": row.get("unique_id", f"row_{idx}"),
                    "MedDatum": row.get("MedDatum", None),
                    "result": None
                })
                if STOP_ON_FIRST_ERROR:
                    break
        # Save per-iteration results
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_path = INPUT_CSV.replace(".csv", f"_results_iter_{iteration}_{timestamp}.json")
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(iteration_results, f, indent=2, ensure_ascii=False)
        print(f"\n✅ Iteration {iteration} complete. Saved to: {output_path}")
        elapsed = time.time() - start_iter
        print(f"⏱️   Iteration {iteration} took {elapsed:.1f}s ({elapsed/total_rows:.1f}s/row)")
    print(f"\n🎉 All {NUM_ITERATIONS} iterations completed!")
 ##
@@ -0,0 +1,481 @@
 1 VISUAL OPTIC FUNCTIONS
 VISUAL ACUITY
 The visual acuity score is based on the line in the Snellen chart at 20 feet 5 meters
 for which the patient makes no more than one error using best available correction
 Alternatively best corrected near vision can be assessed but this should be noted and
 consistently performed during follow up examinations Switching from near to distance
 visual acuity measurements should be avoided in follow up examinations
 VISUAL FIELDS
 0 normal
 1 signs only deficits present only on formal confrontational testing
 2 moderate patient aware of deficit but incomplete hemianopsia on examination
 3 marked complete homonymous hemianopsia or equivalent
 SCOTOMA
 0 none
 1 small detectable only on formal confrontational testing
 2 large spontaneously reported by patient
 * DISC PALLOR
 0 not present
 1 present
 NOTE
 When determining the EDSS step the Visual FS score must be converted to a lower
 score as follows
 Visual FS Score 6 5 4 3 2 1
 Converted Visual FS Score 4 3 3 2 2 1
 FUNCTIONAL SYSTEM SCORE
 0 normal
 1 disc pallor and or small scotoma and or visual acuity corrected of worse eye less than 20 20 1.0 but better than 20 30 0.67
 2 worse eye with maximal visual acuity corrected of 20 30 to 20 59 0.67 – 0.34
 3 worse eye with large scotoma and or moderate decrease in fields and or maximal visual acuity corrected of 20 60 to 20 99 0.33 – 0.21
 4 worse eye with marked decrease of fields and or maximal visual acuity corrected of 20 100 to 20 200 0.2 – 0.1 grade 3 plus maximal acuity of better eye of 20 60 0.33 or less
 5 worse eye with maximal visual acuity corrected less than 20 200 0.1 grade 4 plus maximal acuity of better eye of 20 60 0.33 or less
 6 grade 5 plus maximal acuity of better eye of 20 60 0.33 or less *  =  optional part of the examination
 ### BRAINSTEM FUNCTIONS
 **DYSARTHRIA**
 - **0**: None
 - **1**: Signs only
 - **2**: Mild: Clinically detectable, patient is aware
 - **3**: Moderate: Obvious during conversation, impairs comprehension
 - **4**: Marked: Incomprehensible speech
 - **5**: Inability to speak
 **DYSPHAGIA**
 - **0**: None
 - **1**: Signs only
 - **2**: Mild: Difficulty with thin liquids
 - **3**: Moderate: Difficulty with liquids and solid food
 - **4**: Marked: Sustained difficulty, requires pureed diet
 - **5**: Inability to swallow
 **OTHER CRANIAL NERVE FUNCTIONS**
 - **0**: Normal
 - **1**: Signs only
 - **2**: Mild disability: Clinically detectable deficit, patient is usually aware
 - **3**: Moderate disability
 - **4**: Marked disability
 **EXTRAOCULAR MOVEMENTS (EOM) IMPAIRMENT**
 - **0**: None
 - **1**: Signs only: Subtle EOM weakness, no complaints of vision issues
 - **2**: Mild: Subtle EOM weakness or obvious incomplete paralysis not noticed by patient
 - **3**: Moderate: Obvious incomplete paralysis noticed by patient or complete loss in one direction
 - **4**: Marked: Complete loss in more than one direction
 **NYSTAGMUS**
 - **0**: None
 - **1**: Signs only or mild: Gaze-evoked nystagmus below moderate limits (equivalent to Brainstem FS score of 1)
 - **2**: Moderate: Sustained nystagmus on horizontal/vertical gaze at 30 degrees, patient may not notice
 - **3**: Severe: Nystagmus in primary position or coarse persistent nystagmus interfering with vision; complete internuclear ophthalmoplegia; oscillopsia
 **TRIGEMINAL DAMAGE**
 - **0**: None
 - **1**: Signs only
 - **2**: Mild: Clinically detectable numbness, patient is aware
 - **3**: Moderate: Impaired sharp/dull discrimination in one to three branches or trigeminal neuralgia (at least one recent attack)
 - **4**: Marked: Unable to discriminate between sharp/dull or complete loss of sensation in one or both nerves
 **FACIAL WEAKNESS**
 - **0**: None
 - **1**: Signs only
 - **2**: Mild: Clinically detectable weakness, patient is aware
 - **3**: Moderate: Incomplete facial palsy (e.g., eye closure requires patching, drooling)
 - **4**: Marked: Complete unilateral or bilateral facial palsy with lagophthalmus or difficulty with liquids
 **HEARING LOSS**
 - **0**: None
 - **1**: Signs only: Hears finger rub less on one/both sides, lateralized Weber test but no complaints
 - **2**: Mild: As in 1, aware of hearing problem
 - **3**: Moderate: Does not hear finger rub on one/both sides, misses several whispered numbers
 - **4**: Marked: Misses all or nearly all whispered numbers
 **FUNCTIONAL SYSTEM SCORE**
 - **0**: Normal
 - **1**: Signs only
 - **2**: Moderate nystagmus/EOM impairment/other mild disability
 - **3**: Severe nystagmus/marked EOM impairment/moderate other cranial nerve disability
 - **4**: Marked dysarthria/other marked disability
 - **5**: Inability to swallow or speak
 ### PYRAMIDAL FUNCTIONS
 #### REFLEXES
 - **0**: Absent
 - **1**: Diminished
 - **2**: Normal
 - **3**: Exaggerated
 - **4**: Nonsustained clonus (a few beats of clonus)
 - **5**: Sustained clonus
 ##### Cutaneous Reflexes
 - **0**: Normal
 - **1**: Weak
 - **2**: Absent
 ###### Palmomental Reflex
 - **0**: Absent
 - **1**: Present
 ###### Plantar Response
 - **0**: Flexor
 - **1**: Neutral or equivocal
 - **2**: Extensor
 #### LIMB STRENGTH
 The weakest muscle in each group defines the score for that muscle group. Optional functional tests (hopping on one foot and walking on heels/toes) are recommended for BMRC grades 3–5.
 ##### BMRC Rating Scale
 - **0**: No muscle contraction detected
 - **1**: Visible contraction without visible joint movement
 - **2**: Visible movement only on the plane of gravity
 - **3**: Active movement against gravity, but not against resistance
 - **4**: Active movement against resistance, but not full strength
 - **5**: Normal strength
 #### FUNCTIONAL TESTS
 ##### Pronator Drift (Upper Extremities)
 Pronation and downward drift:
 - **0**: None
 - **1**: Mild
 - **2**: Evident
 ##### Position Test (Lower Extremities)
 Ask patient to lift both legs together, with legs fully extended at the knee. Sinking:
 - **0**: None
 - **1**: Mild
 - **2**: Evident
 - **3**: Able to lift only one leg at a time (grade from the horizontal position at the hip joints in degrees)
 - **4**: Unable to lift one leg at a time
 ##### Walking on Heels/Toes
 - **0**: Normal
 - **1**: Impaired
 - **2**: Not possible
 ##### Hopping on One Foot
 - **0**: Normal
 - **1**: 6–10 times
 - **2**: 1–5 times
 - **3**: Not possible
 #### LIMB SPASTICITY (AFTER RAPID FLEXION OF THE EXTREMITY)
 - **0**: None
 - **1**: Mild: barely increased muscle tone
 - **2**: Moderate: moderately increased muscle tone that can be overcome; full range of motion is possible
 - **3**: Severe: severely increased muscle tone that is extremely difficult to overcome; full range of motion is not possible
 - **4**: Contracted
 #### GAIT SPASTICITY
 - **0**: None
 - **1**: Barely perceptible
 - **2**: Evident: minor interference with function
 - **3**: Permanent shuffling: major interference with function
 #### OVERALL MOTOR PERFORMANCE
 - **0**: Normal
 - **1**: Abnormal weakness (as compared to peers) in performing more demanding tasks, e.g., walking longer distances; no reduction in limb strength on formal testing
 - **2**: Reduction in strength of individual muscle groups at confrontational testing
 #### FUNCTIONAL SYSTEM SCORE
 - **0**: Normal
 - **1**: Abnormal signs without disability
 - **2**: Minimal disability: patient complains of motor-fatigability or reduced performance in strenuous motor tasks (motor performance grade 1) and/or BMRC grade 4 in one or two muscle groups
 - **3**: Mild to moderate paraparesis or hemiparesis: usually BMRC grade 4 in more than two muscle groups; and/or BMRC grade 3 in one or two muscle groups (movements against gravity
 are possible); and/or severe monoparesis: BMRC grade 2 or less in one muscle group
 - **4**: Marked paraparesis or hemiparesis: usually BMRC grade 2 in two limbs or monoplegia with BMRC grade 0 or 1 in one limb; and/or moderate tetraparesis: BMRC grade 3 in three or more limbs
 - **5**: Paraplegia: BMRC grade 0 or 1 in all muscle groups of the lower limbs; and/or marked tetraparesis: BMRC grade 2 or less in three or more limbs; and/or hemiplegia
 - **6**: Tetraplegia: BMRC grade 0 or 1 in all muscle groups of the upper and lower limbs
 ### CEREBELLAR FUNCTIONS
 #### HEAD TREMOR
 - **0**: none
 - **1**: mild
 - **2**: moderate
 - **3**: severe
 #### TRUNCAL ATAXIA
 - **0**: none
 - **1**: signs only
 - **2**: mild (swaying with eyes closed)
 - **3**: moderate (swaying with eyes open)
 - **4**: severe (unable to sit without assistance)
 #### LIMB ATAXIA (TREMOR / DYSMETRIA AND RAPID ALTERNATING MOVEMENTS)
 - **0**: none
 - **1**: signs only
 - **2**: mild (tremor or clumsy movements easily seen, minor interference with function)
 - **3**: moderate (tremor or clumsy movements interfere with function in all spheres)
 - **4**: severe (most functions are very difficult)
 #### TANDEM (STRAIGHT LINE) WALKING
 - **0**: normal
 - **1**: impaired
 - **2**: not possible
 #### GAIT ATAXIA
 - **0**: none
 - **1**: signs only
 - **2**: mild (problems with balance realized by patient and/or significant other)
 - **3**: moderate (abnormal balance with ordinary walking)
 - **4**: severe (unable to walk more than a few steps unassisted or requires a walking aid or assistance due to ataxia)
 #### ROMBERG TEST
 - **0**: normal
 - **1**: mild (mild instability with eyes closed)
 - **2**: moderate (not stable with eyes closed)
 - **3**: severe (not stable with eyes open)
 #### OTHER CEREBELLAR TESTS
 - **0**: normal
 - **1**: mild abnormality
 - **2**: moderate abnormality
 - **3**: severe abnormality
 **NOTE:**
 - The presence of severe gait and/or truncal ataxia alone (without severe ataxia in three or four limbs) results in a Cerebellar FS score of 3.
 - If weakness or sensory deficits interfere with the testing of ataxia, score the patient’s actual performance. Indicate the possible role of weakness by marking an "X" after the
 affected subsystems and Cerebellar FS score.
 #### FUNCTIONAL SYSTEM SCORE
 - **0**: normal
 - **1**: abnormal signs without disability
 - **2**: mild ataxia and/or moderate station ataxia (Romberg) and/or tandem walking not possible
 - **3**: moderate limb ataxia and/or moderate or severe gait/truncal ataxia
 - **4**: severe gait/truncal ataxia and severe ataxia in three or four limbs
 - **5**: unable to perform coordinated movements due to ataxia
 - **X**: pyramidal weakness (BMRC grade 3 or worse in limb strength) or sensory deficits interfere with cerebellar testing
 ### SENSORY FUNCTIONS
 #### SUPERFICIAL SENSATION (LIGHT TOUCH AND PAIN)
 - **0**: normal
 - **1**: signs only (slightly diminished sensation on formal testing, patient not aware)
 - **2**: mild (patient aware of impaired light touch or pain but can discriminate sharp/dull)
 - **3**: moderate (impaired discrimination of sharp/dull)
 - **4**: marked (unable to discriminate between sharp/dull and/or unable to feel light touch)
 - **5**: complete loss (anesthesia)
 #### VIBRATION SENSE (AT THE MOST DISTAL JOINT)
 - **0**: normal
 - **1**: mild (graded tuning fork 5–7 of 8; detects more than 10 seconds but less than examiner)
 - **2**: moderate (graded tuning fork 1–4 of 8; detects between 2 and 10 sec.)
 - **3**: marked (complete loss of vibration sense)
 #### POSITION SENSE
 - **0**: normal
 - **1**: mild (1–2 incorrect responses, only distal joints affected)
 - **2**: moderate (misses many movements of fingers or toes; proximal joints affected)
 - **3**: marked (no perception of movement, astasia)
 * **LHERMITTE’S SIGN** (does not contribute to the Sensory FS score)
  - **0**: negative
  - **1**: positive
 * **PARAESTHESIAE (TINGLING)** (does not contribute to the Sensory FS score)
  - **0**: none
  - **1**: present
 #### FUNCTIONAL SYSTEM SCORE
 - **0**: normal
 - **1**: impaired superficial sensation in one or two limbs
 - **2**: mild impairment in more than two limbs, no major proprioceptive deficits
 - **3**: moderate impairment in more than two limbs with minor proprioceptive deficits
 - **4**: severe impairment in more than two limbs with significant proprioceptive deficits
 - **5**: loss of sensation in one or two limbs, significant proprioceptive deficits in most of the body below the head
 - **6**: essentially no sensation below the head
 ### BOWEL AND BLADDER FUNCTIONS
 #### URINARY HESITANCY AND RETENTION
 - **0**: none
 - **1**: mild (no major impact on lifestyle)
 - **2**: moderate (urinary retention; frequent urinary tract infections)
 - **3**: severe (requires catheterization)
 - **4**: loss of function (overflow incontinence)
 #### URINARY URGENCY AND INCONTINENCE
 - **0**: none
 - **1**: mild (no major impact on lifestyle)
 - **2**: moderate (rare incontinence occurring no more than once a week; must wear pads)
 - **3**: severe (frequent incontinence occurring from several times a week to more than once a day; must wear urinal or pads)
 - **4**: loss of function (loss of bladder control)
 #### BLADDER CATHETERIZATION
 - **0**: none
 - **1**: intermittent self-catheterization
 - **2**: constant catheterization
 #### BOWEL DYSFUNCTION
 - **0**: none
 - **1**: mild (no incontinence, no major impact on lifestyle, mild constipation)
 - **2**: moderate (must wear pads or alter lifestyle to be near lavatory)
 - **3**: severe (in need of enemas or manual measures to evacuate bowels)
 - **4**: complete loss of function
 #### SEXUAL DYSFUNCTION
 **Male**
 - **0**: none
 - **1**: mild (difficulty maintaining erection during intercourse, but achieves erection and still has intercourse)
 - **2**: moderate (difficulty achieving erection, decreased libido, still has intercourse and reaches orgasm)
 - **3**: severe (marked decrease in libido, inability to achieve full erection, intercourse with difficulty, hypoorgasmia)
 - **4**: loss of function
 **Female**
 - **0**: none
 - **1**: mild (mild lack of lubrication, still sexually active and reaches orgasm)
 - **2**: moderate (dyspareunia, hypoorgasmia, decrease in sexual activity)
 - **3**: severe (marked decrease in sexual activity, anorgasmia)
 - **4**: loss of function
 **NOTE**
 When determining the EDSS step, the Bowel and Bladder FS score must be converted to a lower score as follows:
 - Bowel and Bladder FS Score: 6 → Converted Bowel and Bladder FS Score: 5
 - Bowel and Bladder FS Score: 5 → Converted Bowel and Bladder FS Score: 4
 - Bowel and Bladder FS Score: 4 → Converted Bowel and Bladder FS Score: 3
 - Bowel and Bladder FS Score: 3 → Converted Bowel and Bladder FS Score: 3
 - Bowel and Bladder FS Score: 2 → Converted Bowel and Bladder FS Score: 2
 - Bowel and Bladder FS Score: 1 → Converted Bowel and Bladder FS Score: 1
 Sexual dysfunction can be documented but generally does not impact the FS score due to assessment difficulties by examining physicians.
 ### FUNCTIONAL SYSTEM SCORE
 - **0**: normal
 - **1**: mild urinary hesitancy, urgency, and/or constipation
 - **2**: moderate urinary hesitancy/retention and/or moderate urinary urgency/incontinence and/or moderate bowel dysfunction
 - **3**: frequent urinary incontinence or intermittent self-catheterization; needs enemas or manual measures to evacuate bowels
 - **4**: in need of almost constant catheterization
 - **5**: loss of bladder or bowel function (external or indwelling catheter)
 - **6**: loss of bowel and bladder function
 ### CEREBRAL FUNCTIONS
 #### DEPRESSION AND EUPHORIA
 - **0**: none
 - **1**: present (Patient complains of depression or is considered depressed or euphoric by the investigator or significant other.)
 **Note**: Depression and Euphoria are documented on the scoring sheet but are not taken into consideration for FS and EDSS calculation.
 #### DECREASE IN MENTATION
 - **0**: none
 - **1**: signs only (not apparent to patient and/or significant other)
 - **2**: mild (Patient and/or significant other report mild changes in mentation. Examples include: impaired ability to follow a rapid course of association or survey complex matters;
 impaired judgment in certain demanding situations; capable of handling routine daily activities, but unable to tolerate additional stressors; intermittently symptomatic even with
 normal levels of stress; reduced performance; tendency toward negligence due to obliviousness or fatigue.)
 - **3**: moderate (Definite abnormalities on brief mental status testing, but still oriented to person, place, and time)
 - **4**: marked (Not oriented in one or two spheres (person, place, or time); marked effect on lifestyle)
 - **5**: dementia, confusion, and/or complete disorientation
 #### FATIGUE
 - **0**: none
 - **1**: mild (Does not usually interfere with daily activities)
 - **2**: moderate (Interferes but does not limit daily activities for more than 50%)
 - **3**: severe (Significant limitation in daily activities (> 50% reduction))
 **Note**: Because fatigue is difficult to evaluate objectively, in some studies it does not contribute to the Cerebral FS score or EDSS step. Please adhere to the study’s specific
 instructions.
 ### FUNCTIONAL SYSTEM SCORE
 - **0**: normal
 - **1**: signs only in decrease in mentation; mild fatigue
 - **2**: mild decrease in mentation; moderate or severe fatigue
 - **3**: moderate decrease in mentation
 - **4**: marked decrease in mentation
 - **5**: dementia
 ### AMBULATION
 **Unrestricted Ambulation**
 - The patient can walk a normal distance without assistance, comparable to healthy individuals of similar age and physical condition.
 - EDSS step can range from 0 to 5.0, depending on the Functional System (FS) scores.
 **Fully Ambulatory**
 - At least 500 meters of ambulation without assistance, but not unrestricted.
 - EDSS step can range from 2.0 to 5.0, depending on FS scores.
 - The Pyramidal and/or Cerebellar FS must be ≥ 2 to reflect this restriction in ambulation.
 **Ambulation < 500 Meters**
 - If the walking distance is less than 500 meters, the EDSS step must be ≥ 4.5, depending on the walking ranges provided by the ambulation score and combination of FS scores.
 - EDSS steps 5.5 to 8.0 are exclusively defined by the ability to ambulate and type of assistance required, or the ability to use a wheelchair.
 **Assistance Needed**
 - Definitions for EDSS steps 6.0 or 6.5 include both the type of assistance required when walking and the walking range.
 - Assistance by another person is equivalent to bilateral assistance.
 **Note:**
 - The ambulation score represents both the walking range and the type of assistance required.
 - This score replaces several checkboxes used previously on the scoring sheet but does not introduce new definitions.
 - Use of a wheelchair can now be scored on the scoring sheet.
 - Indicate the reported distance and time for the patient in the appropriate field on the scoring sheet, followed by the type of assistance and walking distance measured during assessment.
 ### DISTANCE AND TIME REPORTED BY PATIENT
 **Maximal Unassisted Walking Distance**
 - Maximal unassisted walking distance reported by the patient (in meters) without rest or assistance.
 - Time required to walk the maximum distance according to the patient (in minutes).
 **Assistance**
 0. Without help or assistance (allowing use of an ankle-foot orthotic device, but no other assistive devices).
 1. Unilateral assistance: one stick/crutch/brace.
 2. Bilateral assistance: two sticks/crutches/braces or assistance by another person.
 3. Wheelchair.
 **Distance**
 - Measure the distance the patient can walk in meters.
  - **Unassisted:** Observe walking for a minimum of 500 meters and measure time needed, if possible.
  - **Assisted:** Observe walking with assistive devices or help from another person for a minimum of 130 meters, if possible.
 ---
 ### AMBULATION SCORE
 0. Unrestricted
 1. Fully ambulatory
 2. ≥ 300 meters but < 500 meters, without help or assistance (EDSS 4.5 or 5.0)
 3. ≥ 200 meters but < 300 meters, without help or assistance (EDSS 5.0)
 4. ≥ 100 meters but < 200 meters, without help or assistance (EDSS 5.5)
 5. Walking range < 100 meters without assistance (EDSS 6.0)
 6. Unilateral assistance, ≥ 50 meters (EDSS 6.0)
 7. Bilateral assistance, ≥ 120 meters (EDSS 6.0)
 8. Unilateral assistance, < 50 meters (EDSS 6.5)
 9. Bilateral assistance, ≥ 5 meters but < 120 meters (EDSS 6.5)
 10. Uses wheelchair without help; unable to walk 5 meters even with aid, essentially restricted to wheelchair; wheels self and transfers alone; up and about in wheelchair for some 12 hours a day (EDSS 7.0)
 11. Uses wheelchair with help; unable to take more than a few steps; restricted to wheelchair; may need some help in transferring and wheeling self (EDSS 7.5)
 12. Essentially restricted to bed or chair or perambulated in wheelchair, but out of bed most of the day; retains many self-care functions; generally has effective use of arms (EDSS 8.0)
 Expanded Disability Status Scale (EDSS)
 0   - Normal neurological exam (all Functional Systems [FS] grade 0)
 1.0 - No disability, minimal signs in one FS (one FS grade 1)
 1.5 - No disability, minimal signs in more than one FS (more than one FS grade 1)
 2.0 - Minimal disability in one FS (one FS grade 2, others 0 or 1)
 2.5 - Minimal disability in two FS (two FS grades 2, others 0 or 1)
 3.0 - Moderate disability in one FS (one FS grade 3, others 0 or 1) though fully ambulatory;
 or mild disability in three or four FS (three/four FS grades 2, others 0 or 1) though fully ambulatory
 3.5 - Fully ambulatory but with moderate disability in one FS (one FS grade 3) and mild disability in one or two FS (one/two FS grade 2) and others 0 or 1;
 or fully ambulatory with two FS grades 3 (others 0 or 1);
 or fully ambulatory with five FS grades 2 (others 0 or 1)
 4.0 - Unable to walk > 25 feet without aid
 4.5 - Unable to walk > 100 feet without aid
 5.0 - Relies on a walking aid; unable to walk > 300 feet without resting
 5.5 - Relies on a walking aid; unable to walk > 200 feet without resting
 6.0 - Unable to walk more than 50 feet with or without aid; cannot stand unaided for five minutes
 6.5 - Unable to walk more than 10 feet with or without aid; cannot stand unaided for two minutes
 7.0 - Unable to walk 5 meters even with aid, essentially restricted to wheelchair; wheels self and transfers alone; up and about in wheelchair some 12 hours a day
 7.5 - Unable to take more than a few steps; restricted to wheelchair; may need some help in transferring and in wheeling self
 8.0 - Essentially restricted to bed or chair or perambulated in wheelchair, but out of bed most of the day; retains many self-care functions; generally has effective use of arms
 8.5 - Essentially restricted to bed much of the day; has some effective use of arm(s); retains some self-care functions
 9.0 - Helpless bed patient; can communicate and eat
 9.5 - Totally helpless bed patient; unable to communicate effectively or eat/swallow
 10  - Death due to MS
@@ -0,0 +1,11 @@
 EDSS-kv ::= "\"EDSS\"" space ":" space number
 Reason ::= "\"" char{0,400} "\"" space
 Reason-kv ::= "\"Reason\"" space ":" space Reason
 boolean ::= ("true" | "false") space
 char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
 decimal-part ::= [0-9]{1,16}
 integral-part ::= [0] | [1-9] [0-9]{0,15}
 nicht-klassifizierbar-kv ::= "\"nicht_klassifizierbar\"" space ":" space boolean
 number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
 root ::= "{" space Reason-kv "," space nicht-klassifizierbar-kv ( "," space ( EDSS-kv ) )? "}" space
 space ::= | " " | "\n"{1,2} [ \t]{0,20}
@@ -0,0 +1,25 @@
 Expanded Disability Status Scale (EDSS)
 0   - Normal neurological exam (all Functional Systems [FS] grade 0)
 1.0 - No disability, minimal signs in one FS (one FS grade 1)
 1.5 - No disability, minimal signs in more than one FS (more than one FS grade 1)
 2.0 - Minimal disability in one FS (one FS grade 2, others 0 or 1)
 2.5 - Minimal disability in two FS (two FS grades 2, others 0 or 1)
 3.0 - Moderate disability in one FS (one FS grade 3, others 0 or 1) though fully ambulatory;
 or mild disability in three or four FS (three/four FS grades 2, others 0 or 1) though fully ambulatory
 3.5 - Fully ambulatory but with moderate disability in one FS (one FS grade 3) and mild disability in one or two FS (one/two FS grade 2) and others 0 or 1;
 or fully ambulatory with two FS grades 3 (others 0 or 1);
 or fully ambulatory with five FS grades 2 (others 0 or 1)
 4.0 - Unable to walk > 25 feet without aid
 4.5 - Unable to walk > 100 feet without aid
 5.0 - Relies on a walking aid; unable to walk > 300 feet without resting
 5.5 - Relies on a walking aid; unable to walk > 200 feet without resting
 6.0 - Unable to walk more than 50 feet with or without aid; cannot stand unaided for five minutes
 6.5 - Unable to walk more than 10 feet with or without aid; cannot stand unaided for two minutes
 7.0 - Unable to walk 5 meters even with aid, essentially restricted to wheelchair; wheels self and transfers alone; up and about in wheelchair some 12 hours a day
 7.5 - Unable to take more than a few steps; restricted to wheelchair; may need some help in transferring and in wheeling self
 8.0 - Essentially restricted to bed or chair or perambulated in wheelchair, but out of bed most of the day; retains many self-care functions; generally has effective use of arms
 8.5 - Essentially restricted to bed much of the day; has some effective use of arm(s); retains some self-care functions
 9.0 - Helpless bed patient; can communicate and eat
 9.5 - Totally helpless bed patient; unable to communicate effectively or eat/swallow
 10  - Death due to MS
Author	SHA1	Message	Date
shahin	c9cf9ae9a0	optimized results and new benchmark	2026-05-29 00:42:40 +02:00
shahin	1b7c6a3852	adjustment to triton	2026-05-19 10:21:24 +02:00
shahin	bb9fcf20ae	adjusting the script with new paths	2026-05-19 10:13:29 +02:00
shahin	98df7c70f1	New Organised one	2026-05-19 10:03:52 +02:00
shahin	69f6e76bfe	clean gitignore	2026-05-19 09:23:31 +02:00
shahin	590f2cd68e	Added Loop for multiple models.	2026-05-16 16:50:33 +02:00
shahin	f6ec60e685	isabella box and Error disagreement plot	2026-05-04 16:41:42 +02:00