Database / InfluxDb interview questions
1. What problem does InfluxDB solve?
InfluxDB is optimized for time stamped metrics and events. It is strong for high ingest workloads and time window analytics in observability and IoT systems.
2. How is InfluxDB different from a relational database for telemetry?
Relational systems excel at normalized transactional data. InfluxDB is purpose built for append heavy measurements with timestamps and fast aggregate queries.
3. What are measurement tags fields and timestamp?
Measurement groups related points. Tags are indexed dimensions. Fields store actual values. Timestamp marks event time for each point.
4. What is a bucket in InfluxDB v2?
A bucket stores time series points and defines retention behavior. It is the main logical container for data in InfluxDB v2.
5. Why is retention configuration important?
Retention controls how long data stays in storage. It reduces cost and keeps long range queries efficient by removing stale high resolution data.
6. What is line protocol?
Line protocol is a compact write format using measurement tag set field set and timestamp. It is designed for fast ingestion.
7. How do tags and fields affect performance?
Use tags for dimensions you filter often because tags are indexed. Keep numeric or measured values in fields.
8. What is cardinality in InfluxDB?
Cardinality is the number of unique series from measurement and tag combinations. Very high cardinality increases memory and index pressure.
9. How can you reduce high cardinality?
Avoid unique per event identifiers in tags. Keep stable tag sets and move volatile identifiers like request ids into fields.
10. When should you use Flux?
Use Flux when you need richer transformations, scheduled processing, or more expressive pipelines across time series datasets.
11. When is InfluxQL useful?
InfluxQL is SQL like and familiar for many teams, especially for legacy query patterns and simpler time series retrieval.
12. Show a simple Flux query pattern?
A typical pattern uses from then range then filter to narrow data by time and measurement before aggregation.
13. How does token based security work in v2?
InfluxDB v2 uses scoped API tokens with explicit read and write permissions. Production systems should rotate and store tokens securely.
14. What write pattern is common for IoT?
IoT pipelines often micro batch points and retry transient failures. Stable device tags improve queryability and operational consistency.
15. How do you handle late arriving data?
Define acceptable lateness windows, keep clock synchronization, and run controlled backfill jobs for corrected records.
16. Why is downsampling useful?
Downsampling stores long term trends at lower granularity so storage and long range query cost remain manageable.
17. What are tasks in InfluxDB v2?
Tasks run Flux scripts on schedules to automate rollups, data quality routines, and recurring transformations.
18. What is Telegraf and why use it?
Telegraf is a plugin based metrics agent that collects from many sources and forwards data to InfluxDB.
19. How do you design tags for multi tenant data?
Use stable dimensions like tenant environment and service. Avoid per request or per session identifiers in tags.
20. How can you improve write throughput?
Use batch writes, efficient tag sets, optional compression, and nearby network paths. Validate retry and buffering settings in clients.
21. What is a field type conflict?
A field type conflict occurs when one field key receives different data types across writes. Prevent this with producer side validation.
22. How should backup and restore be planned?
Run regular backups, test restore drills, and align retention, recovery objectives, and operational runbooks.
23. Which health metrics matter most?
Track write errors, ingest latency, query latency, cardinality trends, disk usage, memory pressure, and task failures.
24. How do you troubleshoot slow queries?
Reduce time range first, filter early with indexed tags, and inspect expensive operations like joins and pivots.
25. What bucket strategy works for environments?
Separate buckets by environment and lifecycle such as raw and rollup. This helps permissions, retention, and operations.
26. How can you model SLI metrics?
Store latency and error metrics with stable tags like service, endpoint class, and environment, then compute windowed rollups.
27. What schema anti patterns should be avoided?
Avoid unbounded tag values, inconsistent field typing, and mixing unrelated domains in one measurement.
28. How do edge to cloud pipelines use InfluxDB?
Edge nodes buffer and forward batches during reconnect. Consistent timestamps and idempotent writes reduce duplication risk.
29. What security practices are recommended for enterprise?
Enable TLS, scope tokens, rotate secrets, segment network paths, and separate read and write credentials by workload.
30. How do you explain InfluxDB architecture in an interview?
Explain ingest path, storage behavior, indexing tradeoffs, retention lifecycle, query layer, and operational controls with practical examples.
