Hengshi ChatBI Technical Deep Dive: How NL2Metrics Breaks Through the Enterprise Analytics Accuracy Bottleneck

Article body

Full article

15 min read

Summary: ChatBI is the most closely watched direction in AI+BI convergence, but the accuracy of “natural language to data questions” has always been a key implementation challenge. Hengshi ChatBI takes a different technical path from mainstream solutions—NL2Metrics (Natural Language to Metrics)—using a metrics semantic layer instead of NL2SQL to fundamentally solve the accuracy, security, and governance problems in enterprise-grade scenarios. This article provides an in-depth analysis of Hengshi ChatBI’s design philosophy across three dimensions: technical architecture, implementation principles, and application effects.

##1. The Current State of ChatBI: Popular but Hard to Deploy

Over the past two years, almost every BI vendor has been building ChatBI—users can ask data questions like chatting, and LLMs automatically generate SQL and return results. It sounds great, but enterprises commonly encounter three problems during actual implementation:

Unstable accuracy. Ask the same question in different ways, and the LLM might generate completely different SQL. A sales team asking “How much revenue did East China generate last month?” and a data analyst asking “May East China regional income”—two seemingly identical questions might yield different answers, because the LLM has inconsistent understanding of the definitions of “revenue” and “income.”

Uncontrollable security. NL2SQL solutions require LLMs to directly generate and execute SQL, meaning the model has database query permissions. Once the SQL generated by the model includes unauthorized queries (like accessing tables that shouldn’t be accessible) or creates a Cartesian product causing database pressure, the consequences are severe.

Ungovernable results. When business decisions are based on ChatBI outputs, if data issues are discovered later, it’s difficult to trace whether the “model misunderstood” or “the metric definition has problems.” Different model versions may answer the same question differently, making data consistency impossible to guarantee.

These three problems point to the same root cause: placing accuracy in the LLM’s generative capability rather than in the determinism of business processes.

Hengshi ChatBI Architecture

2. Hengshi’s Technical Path: NL2Metrics vs NL2SQL

The core design choice of Hengshi ChatBI is NL2Metrics—instead of letting the LLM generate SQL in real-time, it matches a predefined metrics semantic layer.

2.1 How NL2SQL Works

User Question → LLM Understands Intent → Generate SQL → Execute SQL → Return Results

Every step in this chain can go wrong:

Intent understanding deviation: “Revenue” might mean with/without tax, GMV/net received
SQL generation errors: Correct syntax but incorrect logic (e.g., JOINed the wrong table)
Execution risks: Inefficient queries causing database pressure
Irreproducible results: Different model versions generate different SQL

2.2 How NL2Metrics Works

User Question → LLM Understands Intent → Match Metrics Semantic Layer → Execute Predefined Metric Query → Return Results

The key difference is in step 3: the LLM doesn’t generate SQL, but performs semantic matching. It searches the enterprise’s metric definition library, finds the best-matching metric (such as “East China Monthly Revenue”), and calls that metric’s predefined query logic.

2.3 Why NL2Metrics Is More Accurate

Dimension	NL2SQL	NL2Metrics
Metric Management	Model “guesses” with each generation	Predefined; Agent only matches
Reproducibility	Same question may yield different answers	Same metric, same definition, guaranteed consistent results
Security Control	LLM directly generates database queries	Isolated through metric layer permissions; no unauthorized access
Update Impact	Model upgrades may change results	Metric logic changes are controllable and traceable
Business Governance	None	Metric lineage, version management, change review

Core Insight: NL2Metrics doesn’t negate LLM capabilities—it places LLMs in what they’re good at (semantic understanding and matching) rather than where they’re unstable (logic generation). It’s like having LLMs do translation instead of math problems.

3. Technical Implementation: Three-Layer Architecture Breakdown

Hengshi ChatBI’s technical architecture can be broken down into three layers:

3.1 Modeling Layer: Building the Metrics Semantic Layer

This is the foundation of ChatBI’s accuracy.

The Hengshi Metrics Platform supports enterprises in building metric systems through:

Drag-and-drop modeling: Business users can define metric calculation logic by dragging, similar to Excel pivot tables
HQL coding modeling: Data analysts can write complex metric logic using HQL (Hengshi Query Language), supporting multi-table joins, window functions, aggregation operations, and more
Template reuse: Pre-built common analysis models (period-over-period, RFM, funnel analysis, etc.) to lower the modeling barrier

Each metric definition includes—

Metric name: Such as “East China Monthly Revenue”
Calculation logic: SUM(amount) WHERE region=‘East China’ AND time=month
Data source: Associated datasets and data tables
Definition description: Business-meaning text description, helping the LLM understand this metric’s meaning in business context
Access control: Which roles/users can query this metric

3.2 Matching Layer: Semantic Matching in NL2Metrics

When users ask questions, ChatBI’s matching process is as follows:

Intent parsing: The LLM parses the user’s natural language question, extracting key entities (region, time, metric type) and intent (query/compare/trend)
Semantic retrieval: Search the metrics semantic layer for matching metric definitions, prioritizing exact matches, then fuzzy matches
Multi-candidate ranking: If multiple metrics match, the LLM selects the most appropriate based on business context
Parameter filling: Fill parameters from the user’s question (such as time range, region) into the metric query template
Execute query: Call the Metrics Platform’s query API, executing based on defined logic

Key Design: If the LLM can’t find a matching metric, ChatBI won’t “guess an SQL”—it informs the user that the metric hasn’t been defined yet and suggests contacting the data team for modeling. This “not answering is better than guessing” design philosophy is what an enterprise-grade product should have.

3.3 Delivery Layer: Result Presentation and Interaction

ChatBI’s query results aren’t just returning a number—they include:

Structured presentation: Automatically select the most appropriate visualization form (table/bar chart/trend line)
Intelligent interpretation: Provide a contextual one-sentence interpretation of results (such as “East China May revenue grew 12.3% month-over-month, mainly driven by 618 pre-sale activities”)
Follow-up support: Users can continue asking follow-up questions within the current context (such as “What about South China?”) without restating the full question
One-click dashboard: ChatBI Q&A results can be saved as dashboard components with one click, becoming persistent analytical assets

4. Enterprise-Grade Features: Not Just Answering Correctly, But Being Trustworthy

4.1 Security Access Control

Hengshi ChatBI’s security model has three layers:

Metric layer permissions: Users can only query metrics they’re authorized to access
Data layer permissions: Row-Level Security controls data visibility—for example, East China regional managers can only see East China data
Feature layer permissions: Distinguishing between roles like “can only ask questions,” “can model,” and “can manage metrics”

4.2 Multi-Tenant Isolation

For ISV (Independent Software Vendor) scenarios, Hengshi ChatBI supports—

Tenant-level metric isolation: Different customers’ metric systems are completely independent
Tenant-level data isolation: Physical or logical data isolation
Tenant-level LLM configuration: Different customers can use different LLM services

4.3 Multi-Channel Access

ChatBI capabilities can be embedded into enterprise workflows through various methods:

HENGSHI SENSE embedded: One-click ChatBI conversation from the BI platform’s report viewing page
API access: Embed into enterprise applications (OA, CRM, WeChat Work) via RESTful API
HENGSHI BOX on-premises deployment: All inference and data queries are completed locally; data never leaves the enterprise boundary

5. Application Scenarios

Scenario 1: Business Users Self-Service Data Query

Traditional pain point: The VP of Sales wants to know “the retention rate trend for key customers in each region this quarter”—they need to first schedule time with data analysts, wait two or three days for the report.

ChatBI experience: Enter the question directly, get structured results and trend charts in seconds. After discovering anomalies, follow up with “Which customers in East China churned?” to get detailed data.

Scenario 2: Management Operational Dashboard

Traditional pain point: Before monthly operational review meetings, the BI team works overtime to produce dozens of pages of reports. After management reviews them and discovers new angles of inquiry, they have to wait for the next version.

ChatBI experience: Management can ask questions directly to the big screen during meetings: “Compare with the same period last year,” “Gross margin change broken down by product line”—get real-time insights without waiting.

Scenario 3: Embedded in ISV Products

Traditional pain point: SaaS vendors want to provide AI analytics capabilities to their customers, but building ChatBI in-house is too costly and accuracy is hard to guarantee.

ChatBI experience: Through Hengshi’s embedded ChatBI, SaaS vendors can provide intelligent Q&A capabilities based on each customer’s data. Data and metrics between different customers are completely isolated and ready to use out of the box.

6. Core Differences from ChatBI Competitors

It’s important to clarify that Hengshi ChatBI’s positioning is not a general-purpose conversational AI, but a ChatBI specifically designed for enterprise BI scenarios. This positioning determines its different approaches in several key decisions:

Accuracy priority over experience priority — Better to say “I don’t know” than to give wrong answers
Metrics semantic layer as infrastructure, not optional — No modeling, no accuracy; this is a design premise
ISV-oriented multi-tenant scenarios — B2B2B permissions and isolation requirements considered from day one
Agent-schedulable — ChatBI is not just a standalone product, but also a module in the Data Agent Family, callable by Agents via CLI

7. FAQ

Q1: Does NL2Metrics limit LLM capabilities? What if a user asks about a metric that hasn’t been predefined?

A: This is precisely the core design philosophy of NL2Metrics—“not answering is better than guessing.” When users ask about undefined metrics, ChatBI clearly informs them that the metric hasn’t been modeled yet, and guides them to contact the data team. This appears to reduce flexibility but actually protects decision-making accuracy. Moreover, Hengshi’s Modeling Agent can accelerate the creation of new metrics, allowing the metric library to grow dynamically with business needs.

Q2: What kind of enterprises do you recommend start with ChatBI?

A: Enterprises with existing basic data modeling capabilities should prioritize this. ChatBI’s accuracy directly depends on the quality of the metrics layer. If an enterprise doesn’t even have basic data warehouse and metric definitions, deploying ChatBI directly will only amplify the chaos. Hengshi’s recommended path: first use the Hengshi platform for data integration and metric modeling, then proceed with AI enhancement.

Q3: Can ChatBI handle multi-turn complex conversations?

A: Yes, it supports contextually coherent multi-turn conversations. For example: “How much revenue did East China generate last month?” → “Month-over-month?” → “Break down to city level” → “Put the trend chart on my dashboard.” Each turn executes within the context of the previous turn.

8. Summary

Hengshi ChatBI’s biggest differentiation lies not in how good its conversational experience is, but in its choice of a harder but more correct technical path—using modeling to guarantee accuracy, using the semantic layer to replace SQL generation.

In enterprise-grade scenarios, the ultimate question for ChatBI is not “Can AI understand the question?” but “Can enterprises trust the answers AI provides?” From this perspective, NL2Metrics’s accuracy rate is not a matter of 95% or 98%—it’s a matter of “can we be wrong or not.” In decision-support scenarios, the answer is the latter.