Generated using the command:
```bash
$ QSV_LLM_BASE_URL=https://api.together.xyz/v1 QSV_LLM_APIKEY=THEKEY \
  QSV_DESCRIBEGPT_DB_ENGINE=/opt/homebrew/bin/duckdb \
  qsv describegpt /tmp/NYC_311_SR_2010-2020-sample-1M.csv \
  --output nyc311-describegpt-prompt.md \
  --sql-results nyc311-describegpt-prompt-results \
  --prompt "What are the top 10 complaint types by community board and borough?" \
  --addl-props '{"reasoning_effort": "high"}'
```
---
# Prompt
```sql
-- Generated by qsv's describegpt command using openai/gpt-oss-20b on 2025-10-20T03:53:20.199536+00:00
-- This query returns the top 10 complaint types for each combination of Community Board and Borough.
-- It first aggregates the counts per "Community Board", "Borough", and "Complaint Type",
-- then ranks them within each group, and finally filters to the top 10 per group.

WITH grouped AS (
    SELECT
        "Community Board",
        "Borough",
        "Complaint Type",
        COUNT(*) AS cnt
    FROM read_csv_auto('/tmp/NYC_311_SR_2010-2020-sample-1M.csv')
    WHERE "Community Board" IS NOT NULL
      AND "Borough" IS NOT NULL
      AND "Complaint Type" IS NOT NULL
    GROUP BY "Community Board", "Borough", "Complaint Type"
),
ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY "Community Board", "Borough"
            ORDER BY cnt DESC
        ) AS rn
    FROM grouped
)
SELECT
    "Community Board",
    "Borough",
    "Complaint Type",
    cnt
FROM ranked
WHERE rn <= 10
ORDER BY "Community Board", "Borough", cnt DESC;
```


## REASONING

We need to respond to the user's prompt: "User's Prompt: What are the top 10 complaint types by community board and borough?"

We need to parse the instruction: The user asks for top 10 complaint types by community board and borough. We can answer using summary statistics and frequency distribution. But we need to derive the answer, or if not available, propose a SQL query. The data provided includes frequency distribution for Complaint Type, but not by community board and borough. We need to use SQL to get top 10 complaint types by community board and borough. The user likely expects a list, not just a SQL query. The instructions: "When you are asked to generate a Data Dictionary, Description or Tags, use the provided Summary Statistics and Frequency Distribution to guide your response." This is not a data dictionary request. The user asks for top 10 complaint types by community board and borough. We need to answer that. We can either compute by reading the data? The dataset size is 1 million records. We cannot compute manually. But we can provide a SQL query that would produce the answer. The instruction: "If the User's Prompt can be answered by using the Dataset's Summary Statistics and Frequency Distribution data below, immediately return the answer. Otherwise, using the Dataset's Summary Statistics, Frequency Distribution and Data Dictionary below, create a SQL query that can be used to answer the User's Prompt." The prompt requires grouping by community board and borough and complaint type, counting, ordering by count descending and top 10 per group. We need to provide a SQL query. The instruction: "SQL Query Generation Guidelines: - Use DuckDB version syntax. - The input csv has headers and uses , as the delimiter. - Column names with spaces and special characters are case-sensitive and should be enclosed in double quotes. - Only use the read_csv_auto table function to read the input CSV. - Use the placeholder {INPUT_TABLE_NAME} for the input csv in the read_csv_auto table function call. - Remember that the date format specifier is the second parameter for the date functions strftime() and strptime() - Remember that the date format specifier for seconds is %S, NOT %s - Only use functions from the following Loaded DuckDB extensions: autocomplete, core_functions, icu, json, parquet, shell - Make sure the generated SQL query is valid and has comments to explain the query - Add "-- {GENERATED_BY_SIGNATURE}" at the top of the query"

We need to produce an answer accordingly. The content: We should produce the SQL code block preceded by a newline, with the comment header. The query: SELECT "Community Board", "Borough", "Complaint Type", COUNT(*) AS count FROM read_csv_auto({INPUT_TABLE_NAME}) GROUP BY "Community Board", "Borough", "Complaint Type" ORDER BY "Community Board", "Borough", count DESC LIMIT 10? But we need top 10 per group, not overall. Use a window function: partition by "Community Board", "Borough" order by count desc, then filter row_number <=10. Example:

WITH counts AS (
SELECT "Community Board", "Borough", "Complaint Type", COUNT(*) AS count
FROM read_csv_auto({INPUT_TABLE_NAME})
WHERE "Community Board" IS NOT NULL AND "Borough" IS NOT NULL AND "Complaint Type" IS NOT NULL
GROUP BY "Community Board", "Borough", "Complaint Type"
),
ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY "Community Board", "Borough" ORDER BY count DESC) AS rn
FROM counts
)
SELECT "Community Board", "Borough", "Complaint Type", count
FROM ranked
WHERE rn <= 10
ORDER BY "Community Board", "Borough", count DESC;

We need to ensure correct quoting. The column names have spaces and are case-sensitive: "Community Board" and "Borough" and "Complaint Type". Use double quotes.

We also need to use the placeholder {INPUT_TABLE_NAME} for the input csv. So we should write:

SELECT "Community Board", "Borough", "Complaint Type", count
FROM
(
  SELECT "Community Board", "Borough", "Complaint Type", COUNT(*) as count,
         ROW_NUMBER() OVER (PARTITION BY "Community Board", "Borough" ORDER BY COUNT(*) DESC) as rn
  FROM read_csv_auto({INPUT_TABLE_NAME})
  GROUP BY "Community Board", "Borough", "Complaint Type"
)
WHERE rn <= 10
ORDER BY "Community Board", "Borough", count DESC;

But we need to ensure grouping is correct. Actually the row_number should be computed after grouping: we can do a subquery that groups, then compute the row_number. The above query uses grouping in the same subquery, but row_number uses COUNT(*) but count is aggregated. In SQL, you cannot use COUNT(*) inside row_number after grouping? It's okay because we alias count and use row_number on that alias. But we must compute row_number after grouping, so we can use an outer query.

Better approach:

WITH grouped AS (
  SELECT "Community Board", "Borough", "Complaint Type", COUNT(*) AS cnt
  FROM read_csv_auto({INPUT_TABLE_NAME})
  WHERE "Community Board" IS NOT NULL AND "Borough" IS NOT NULL AND "Complaint Type" IS NOT NULL
  GROUP BY "Community Board", "Borough", "Complaint Type"
),
ranked AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY "Community Board", "Borough" ORDER BY cnt DESC) AS rn
  FROM grouped
)
SELECT "Community Board", "Borough", "Complaint Type", cnt
FROM ranked
WHERE rn <= 10
ORDER BY "Community Board", "Borough", cnt DESC;

We need to add the comment header: "-- {GENERATED_BY_SIGNATURE}". We can include a comment at the top.

We should also include a comment inside to explain the query.

Thus produce the SQL code block preceded by a newline.

We should not output any other text besides the SQL code block. The instruction says: "Return the SQL query as a SQL code block preceded by a newline." So we must output a newline then the code block. The code block likely starts with ```sql. So we need to output:

```sql
-- {GENERATED_BY_SIGNATURE}
-- Explanation: ...
...

```

Yes.

So final answer: Provide the SQL query.

We must not mention other details. Provide only the code block. Ensure we have the placeholder.

Let's craft the final answer accordingly.
## TOKEN USAGE

TokenUsage { prompt: 23986, completion: 1595, total: 25581 }
---
