Database / InfluxDb interview questions II
A typical architecture includes data producers, ingestion endpoints, buckets for storage, query services, tasks for automation, and dashboards or APIs for consumption.
InfluxDB optimizes append-style writes, batches points efficiently, and uses storage/index strategies tuned for time-based ingestion patterns.
Choose Cloud when you want managed operations, elastic scaling, and reduced maintenance overhead. Choose self-hosted when strict control or data residency constraints dominate.
An organization scopes users, buckets, dashboards, and tokens so teams can isolate data and permissions cleanly.
API tokens are fine-grained and service-friendly, while username/password is primarily for interactive login; production pipelines should use scoped tokens.
Precision affects storage efficiency and analytical correctness. Overly fine precision can inflate payloads, while coarse precision can hide meaningful spikes.
Clients should retry transient failures with backoff and preserve order/semantics as needed to avoid silent data loss during network hiccups.
Use stable, domain-oriented names, avoid ambiguous abbreviations, and document ownership to prevent schema drift across services.
Schema drift occurs when producers change fields/tags unpredictably. Prevent it with contracts, producer validation, and CI checks.
Use tags for filter/group dimensions and fields for measured values. If you frequently filter it, it likely belongs in tags.
Store latency as numeric fields and compute percentiles in query pipelines over defined windows and dimensions.
Store state as compact field values with clear tags for entity identity, and query transitions with windowing or change-detection logic.
Tag region consistently, align retention by policy, and design rollups so global views and regional drill-down remain fast.
Common tiers are short-lived raw high-resolution data, medium-term rolled-up metrics, and long-term coarse trend archives.
Tasks run scheduled Flux logic to aggregate raw series into rollup buckets, reducing long-term cost and query latency.
Backfill in bounded batches, validate field types, monitor cardinality impact, and avoid overwhelming live ingestion paths.
Frequent causes include field type conflicts, malformed line protocol, invalid timestamps, and permission mismatches.
Replay representative load, inspect cardinality growth, validate query plans, and run operational failure drills.
Treat dashboard/query definitions as code, store in version control, and promote changes through review and environment gates.
Idempotency means repeated ingestion attempts do not corrupt outcomes; design keys and write logic to tolerate retries.
Track producer time versus ingest time deltas, alert on sustained lag, and correlate with queue/backpressure metrics.
Backpressure is downstream saturation; clients must buffer, batch, and retry responsibly to avoid data drops.
Estimate points/sec, field count, tag cardinality, retention duration, and compression assumptions, then validate with load tests.
Define SLOs for write success rate, write latency, query latency, and freshness of derived metrics.
Use TLS, least-privilege tokens, secret rotation, audit trails, network segmentation, and strict environment separation.
Inspect newly introduced tag keys/values, identify high-churn dimensions, and roll back schema changes causing explosion.
Processors can normalize fields, drop noisy attributes, enrich tags, and enforce cleaner payloads before storage.
Edge buffering protects against intermittent links, preserving data continuity until connectivity is restored.
Use stable device metadata tags, quality flags as fields, and filtering/aggregation tasks to control noise.
Use separate measurements/buckets and clear taxonomy to keep ownership, retention, and access policies manageable.
Producer-side validation catches malformed values early, reducing partial writes and downstream cleanup effort.
Use overlapping token validity windows, staged rollout, and health checks to switch credentials safely.
Add stable operational dimensions like service, cluster, region, and environment for fast filtering during incidents.
Use representative datasets, realistic time windows, warm/cold cache scenarios, and repeatable query suites.
Split by domain semantics and access patterns; unrelated schemas in one measurement hurt clarity and performance.
Higher granularity improves diagnostics but increases storage/compute costs; rollup strategy balances both.
Isolate with separate buckets/tokens and optionally org boundaries, enforcing least privilege across environments.
Use lowercase stable names, avoid synonyms, and document conventions so queries remain predictable.
Document measurement purpose, tag/field definitions, units, retention, and ownership in version-controlled specs.
Validate units at ingestion, annotate metadata, and normalize values via tasks before broad consumption.
Default to bounded windows, avoid unbounded scans, and provide drill-down links for deeper analysis.
Choose windows aligned to signal frequency and business need, balancing smoothness with responsiveness.
Create stable, low-noise signals with clear thresholds and consistent tags so alerts are actionable.
Define source identity tags, dedup logic, and collector coordination to avoid double-counting.
Store timestamps in UTC and apply timezone conversion only at presentation layers.
Synthetic probes provide controlled baselines that help distinguish user-impacting issues from telemetry gaps.
Provide schema catalogs, starter queries, dashboard templates, and naming conventions with examples.
Trend utilization metrics over long windows, correlate demand drivers, and forecast thresholds for scaling decisions.
Highlight storage/query model differences, ecosystem fit, retention patterns, and operational ownership trade-offs.
Confirm schema contract, security controls, retention tiers, dashboards, alerts, backups, and restore drill readiness.
