DataStructures / System Design
Horizontal Scaling also referred to as "scale-out" is the addition of more machines or setting up a cluster/distributed environment for your software system. This usually requires a load-balancer program which is a middleware component in the standard 3 tier client-server architectural model.
Vertical Scaling also referred to as "scale-up" approach is an attempt to increase the capacity of a single machine by adding more processing power (CPU) or by adding more storage memory (RAM).
Load balancer is responsible to distribute user requests (load) among the various back-end systems/nodes in the cluster. Each of these back-end machines runs a copy of your software and hence capable of servicing requests.
Another common responsibility is "health-check" where the load balancer uses the "ping-echo" protocol or exchanges heartbeat messages with all the servers to ensure they are up and running fine.
- Round Robin also called as "Next in Loop".
- Weighted Round Robin, similar to Round Robin, but some servers get a larger share of the overall traffic.
- Random.
- In Source IP hash Connections are distributed to backend servers based on the source IP address. If a web node fails and is taken out of service the distribution changes. As long as all servers are running a given client IP address will always go to the same web server.
- Using Least connections, the load balancer monitors the number of open connections for each server and sends to the least busy server.
- Least traffic. The load balancer monitors the bitrate from each server and sends to the server that has the least outgoing traffic.
- Least latency. The load balancer makes a quick HTTP OPTIONS request to backend servers, and sends the request to the first server to answer.
The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a (non-error) response without guarantee that it contains the most recent write.
Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.
Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.
A database shard is a horizontal partition of data in a database. Each individual partition is referred to as a database shard. Each shard is held on a separate database server instance, to spread the load. Some data within a database remains present in all shards,[notes 1] but some appear only in a single shard. Each shard (or server) acts as the single source for this subset of data.
Partitioning is a general term used to describe the act of breaking up your logical data elements into multiple entities for the purpose of performance, availability, or maintainability.
Sharding is the equivalent of "horizontal partitioning".
"Vertical partitioning" is the act of splitting up the data stored in one entity into multiple entities for space and performance reasons.
Eventual consistency makes sure that data of each node of the database gets consistent eventually. Time taken by the nodes of the database to get consistent may or may not be defined.
In Strong consistency, data will get passed to all the replicas as soon as a write request comes to one of the replicas of the database. But while these replicas are being updated with new data, response to any subsequent read/write requests gets delayed as all replicas are busy in keeping each other consistent.
SQL database is a better choice for any business that has the pre-defined structure and set schemas. Applications that involve multi-row transactions - like accounting systems, warehousing, payment systems can be benefitted using SQL database.
NoSQL database is a good choice for businesses that have rapid growth or databases with no clear schema definitions. If you cannot define a schema for your database, or if your schema continues to change for apps such as mobile apps, real-time analytics, content management systems, it is the better choice.
Transport Layer Security (TLS) is a cryptographic protocol that provide communications security over a computer network. The TLS protocol aims primarily to provide privacy and data integrity between two communicating computer applications that ensure private connection and maintain integrity.
Throttling is a process that is used to control the usage of APIs by consumers during a given period. You can define throttling at the application level and API level. Throttling limit is considered as cumulative at API level.
Hard real-time expects every hit must meets its deadline. Hard real-time systems very few and used in medical and defense fields.
Soft real-time systems, also known as firm real-time system, allow some hits if it miss deadline. That is considered common scenario although too many misses are not tolerated.
Nat Traversal also known as UDP encapsulation allows traffic to get to the specified destination when a device does not have a public address. This is usually the case if your ISP is doing NAT, or the external interface of your firewall is connected to a device that has NAT enabled.
HLS stands for HTTP Live Streaming. HLS is a media streaming protocol for delivering visual and audio media to viewers over the internet.
Its adaptive bitrate video delivery is a combination of server and client software that detects a client's bandwidth capacity and adjusts the quality of the video stream between multiple bitrates and/or resolutions.
Perform security tests in CD/CD: CI/CD processes and tools are great places to include security tools and security uni-test cases. Generally, developers are amenable to fixing flagged vulnerabilities on merges but more resistant to addressing large security coding problems prior to shipping a product/service.
Understand your software supply chain: include automation in CI/CD process to create a list of third-party components. OWASP Dependency check can help you identify your components, versions, and known vulnerabilities associated with each library version.
Upgrade your libraries: To benefit from vulnerability remediation in your software supply chain, you must upgrade your third-party libraries.
Use popular third-party libraries: Use only well-maintained third-party libraries. Libraries that are not well-maintained will impact your security agility and will leave you vulnerable longer to well-known issues.
Design for easy upgrades: Establish development standards for code compliance. For example, you may be compiling with Java 11, but you can mandate code compliance to Java 9. The benefit of separating your build compliance from software code compliance is that you have more flexibility to downgrade or upgrade.
Avoid poor configuration: avoid hardcoded/clear text passwords in configuration. Use information from sites like "Security/Server side TLS at Mozilla" to generate configuration/ provide cipher suite recommendations.
Protect against MITM attacks: Employ encryption to defend against man-in-the-middle (MitM) attacks.
Protect against replay attacks: You can use a cryptographically secure nonce to defend against a replay attack.
Apply security controls on the server: Designers often apply validation to Javasript front-ends for web apps. It's acceptable to include validation on the client for performance reasons, saving a round-trip to the server. However, client security can't be applied in lieu of server security. All security controls must be deployed on the server since attackers can bypas browser security controls by calling your web service interfaces directly.
Protect against credential leakage: Ensure all servers communicate securely through encrypted connections. Don't store credentials in the clear. Store passwords hashed and salt hashes on a per-user basis.
In July 2017, Equifax suffered a breach, disclosing 150 million customer records. The exploit was due to a known vulnerability in the Apache struts2 library. Failure to patch quickly placed Equifax and its customers at risk.
From the operational perspective, patching can destabilize production systems and lead to outages however from security perspective, "patch often" is the motto. Security patches remediate known vulnerabilities that provide an easy means for attackers to exploit products and services.
- Cross-site scripting (XSS),
- SQL injection,
- Command injection,
- Insecure redirects,
- Insecure file upload/download,
- and Buffer overflow.
- Insufficient authentication,
- Insufficent authorization,
- Parameter tampering,
- and Cross-Site request forgery (CSRF).
- Insecure cryptographic algorithm,
- Insecure password management,
- Insecure session management,
- and information exposure.
Cross-site scripting (XSS) occurs when malicious code is included in an HTML response, that alters the way the page is rendered. The malicious data is interpreted as script and executed on the client's browser.
There are 2 types of XSS.
Reflected: Data from the incoming HTML request is returned in the outgoing HTML response. This type of XSS targets a specific user by exploiting a defect in the application that results in the application returning the malicious data back to the user.
Persisted: Data on the server is included in the outgoing HTTP response. Usually targets one or more users.
XSS can result in unavailability, defacement, unauthorized access, session hijacking, identity theft, account harvesting, or full compromise of the system.
To mitigate XSS risk:
- Use appropriate encoding on data to change HTTP responses.
- Utilize HTTP security headers, such as content security policy, and use safe APIs such as textContent instead of innerHtml.
- Consider using client-side templating libraries.
SQL injection is the highest application security concern because it's well known, easy to perform and operates on the database server.
SQL injection occurs when:
- Malicious data is used to construct SQL statements via string concatenation, thus commingling executable code and data.
- Executable code and data are commingled; the database server parsing the query may interpret the "data" as executable code/query. This allows the data to alter the meaning/result of the SQL query.
- A single attack can leak an entire database, alter or destroy a database, or even lead to a compromise of the server where the database is deployed.
To mitigate the SQL injection risk:
- Use parameterized queries with bind variables to ensure that data added to the statement cannot alter the intention of the statement.
- Validate untrusted data; data submitted by end users.
SQL Injection example:
In the following code $user_name and $password represent variable for user input, which are then used to create a SQL statement.
SELECT * FROM BankAccounts WHERE username=$user_name AND password = $password;
A malicious user may provide input which bypasses the intended functionality of the query and which actually grants unlimited access, just by simply commenting out the "AND" portion of the SQL statement.
SELECT * FROM BankAccounts WHERE username='admin'--'AND password = 'password123';
Notice the comment indicator (--) after the username. The database will interpret the rest of the query as a comment, ignoring the password verification. The effective statement that gets executed is:
SELECT * FROM BankAccounts WHERE username='admin'
The attacker who submitted that login attempt would be granted access to resources owned by the "admin" account.
Command injections attacks exploit application functionality that makes system calls or commands using untrusted data.
Attacks become possible when an application passes unsafe user-supplied data such as forms, cookies, and HTTP headers to the system as part of a shell command. This type of security lapse occurs due to poor security architecture. While input validation can help prevent successful attacks, the failure to keep data isolated from code is the source of risk.
When attacks occur on an application server, this may compromise the server or result in data exposure.
To mitigate command injection risk:
- Do not pass untrusted data to system calls or commands.
- Validate untrusted data against a whitelist and encode data to protect again problematic characters.
This type of injection defects occur where untrusted data redirects used to faulty/malicious sites.
Redirects allows web application to direct users to different pages within the same application or to an external site. An insecure redirect sends the user to an untrusted or malicious site.To prevent insecure redirects:
- Treat all data received from a client as untrusted.
- Define redirect URLs within the application using trusted, whitelisted data.
Uploading/downloading files in an insecure manner is a broad type of risk that covers path manipulation, data caching, file handling, malware and anti-virus, access control, and bandwidth concerns.
Path manipulation is a major concern. For this type of injection defect, untrusted data is used to construct the path to the resource. For example:
- Path manipulation for a download may allow unauthorized access to a resource.
- Path manipulation for a upload may allow a file to be placed on an unauthorized location covering up a legitimate resource.
- Path manipulation for an upload may allow an executable file to open up "back-doors".
Mitigation strategies:
- Do not use untrusted data in file paths or names; instead, use server-generated file paths and names.
- Restrict the application's access to its home directory and subdirectories, thereby leveraging the operating system's access controls.
- Normalize the path prior to validation and authorization checks when using untrusted data as part of a file path. Validate the path against a whitelist.
Buffer overflow occurs when an application writes more data into an area of memory, called a buffer than was intended.
Buffers are created to contain a finite amount of data. When the data is longer than expected, data will overflow into one or more adjacent memory locations (buffers) replacing the original data. This results in:
- Erratic program behaviour.
- Data exposure to unauthorized parties.
- Processor tricked into running arbitrary code.
Mitigation strategies:
- Check the length of data and limit it to the expected size.
- Never assume that code will safely handle untrusted data.
- Use libraries explicitly created to perform string and other memory operations in a secure fasion.
Authentication is the act of proving one's identity. Authorization is the act of proving one's access privileges.
Think of Authentication as locking into an account, for this to happen smoothly a system has to verify your identity before letting you in. Authorization differs, although you are authenticated into the system, you may not be authorized to perform certain functions.
Parameter tampering, also known as insecure direct object reference, occurs when attackers manipulate parameters exchanged between client and server to gain access unauthorized access to data.
Examples of parameter values frequently manipulate include:
- cookies.
- URL parameters.
- Drop-down list, Radio buttons and checkboxes.
- database primary fields are stored in hidden fields.
Mitigation strategies:
- Perform resource entitlement checks on every data access request.
- Do not rely on client-provided information for authorization, other than the sessionID. Map sesionIDs to primary keys and other fields as a server side operation.
- Implement tokenization, where the database primary keys are indirectly referenced.
Cross-site request forgery (CSRF) occurs when a malicious website, email, blog, instant message, or program causes a user's web browser to perform an unwanted action on a trusted site where the user is currently authenticated.
These attacks can make use of a target system's normal functions -- such as transferring funds, changing passwords, using the target's browser without the knowledge of the target user.
Mitigation strategies:
- Do not rely solely on the presence of a valid sessionID or a cookie.
- Include a unique, single-use value in every response sent to the browser and then validated that token when a request is submitted.
- Require users to re-authenticate for high-risk transactions.
SACM is a primary information technology-business process that is foundational and required to mitigate system vulnerabilities and risk of cyberattacks against any organization.
It is a collection of processes that achieve operational control, systematic onboarding, validation, updates, maintenance, and disposal of technology assets as well as management of configuration items.
Digital accessibility is about making digital products and services accessible to those with disabilities. A website, application or document is accessible when a person with diverse abilities can use it to perform the task or access the service for which it is intended without reliance on the assistance of other people.
WAI-ARIA, the Accessible Rich Internet Applications Suite, defines a way to make Web content and the Web applications more accessible to people with disabilities. It especially helps with dynamic content and advanced user interface controls developed with Ajax, HTML, JavaScript, and related technologies.
WAI-ARIA is a W3C initiative, that defines web standard.
VUI stands for Voice User Interface. It allows the user to interact with a system through voice or speech commands. Some of the platforms include Google Assistant, Amazon Alexa, Hey Cortona, Siri, and Samsung Bixby.
It is nothing but a application for Alexa.
- Ease of Access.
- Speed & Efficiency.
A TXT record is a type of resource record in the Domain Name System(DNS) used to provide the ability to associate arbitrary text with a host or other names, such as human-readable information about a server, network, data center, or other accounting information.
Resident Set Size (RSS) is the amount of RAM your process is consuming.
The Twelve(12)-Factor App methodology is a methodology for building software-as-a-service applications. These best practices are designed to enable applications to be built with portability and resilience when deployed to the web.
Factor. | Description. |
Codebase | There should be exactly one codebase for a deployed service with the codebase being used for many deployments. |
Dependencies | All dependencies should be declared, with no implicit reliance on system tools or libraries. |
Config | Configuration that varies between deployments should be stored in the environment. |
Backing services | All backing services are treated as attached resources and attached and detached by the execution environment. |
Build, release, run | The delivery pipeline should strictly consist of build, release, run. |
Processes | Applications should be deployed as one or more stateless processes with persisted data stored on a backing service. |
Port binding | Self-contained services should make themselves available to other services by specified ports. |
Concurrency | Concurrency is advocated by scaling individual processes. |
Disposability | Fast startup and shutdown are advocated for a more robust and resilient system. |
Dev/Prod parity | All environments should be as similar as possible. |
Logs | Applications should produce logs as event streams and leave the execution environment to aggregate. |
Admin Processes | Any needed admin tasks should be kept in source control and packaged with the application. |
Certificate pinning restricts which certificates are considered valid for a particular website, limiting risk. Instead of allowing any trusted certificate to be used, operators "pin" the certificate authority (CA) issuer(s), public keys, or even end-entity certificates of their choice. Clients connecting to that server will treat all other certificates as invalid and refuse to make an HTTPS connection.
In SSL authentication, the client is presented with a server's certificate, the client computer might try to match the server's CA against the client's list of trusted CAs. If the issuing CA is trusted, the client will verify that the certificate is authentic and has not been tampered with. This is also known as 1-way SSL authentication.
Whereas in mutual SSL authentication, both client and server authenticate each other through the digital certificate so that both parties are assured of the others' identity. This is also known as 2-way SSL authentication.
OAuth (Open Authorization) is an open standard for access delegation, commonly used as a way for Internet users to grant websites or applications access to their information on other websites but without giving them the passwords.
A canary release is a software testing technique used to reduce the risk of introducing a new software version into production by gradually rolling out the change to a small subgroup of users, before rolling it out to the entire platform/infrastructure.
Domain-Driven Design is a way of looking at software from top-down.
When we are developing software our focus shouldn't be primarily on technology, it should be primarily on business or whatever activity we are trying to assist with the software, the domain.
Specifically we approach that by trying to develop models of that domain and make our software conformed to that.
Transport Layer Security (TLS) is an encryption protocol in wide use on the Internet. TLS, which was formerly called SSL, authenticates the server in a client-server connection and encrypts communications between client and server so that external parties cannot spy on the communications.
Mutual TLS, or mTLS for short, is a method for mutual authentication. mTLS ensures that the parties at each end of a network connection are who they claim to be by verifying that they both have the correct private key. The information within their respective TLS certificates provides additional verification. mTLS is often used in a Zero Trust security framework to verify users, devices, and servers within an organization. It can also help keep APIs secure.
RSocket is an open-source streaming message protocol with Reactive Extension/Stream semantics initially created by Netflix. The main difference between RSocket and traditional TCP web socket is, RSocket is flexible and adds reactive streams.
As per Wikipedia, Site reliability engineering is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.
The Mutex is a locking mechanism that makes sure only one thread can acquire the Mutex at a time and enter the critical section. This thread only releases the Mutex when it exits the critical section.
The Mutex is a locking mechanism that makes sure only one thread can acquire the Mutex at a time and enter the critical section. This thread only releases the Mutex when it exits the critical section.
Privilege escalation is a type of network attack used to gain unauthorized access to systems within a security perimeter.
Attackers start by finding weak points in an organization's defenses and gaining access to a system. In many cases, the first point of penetration will not grant attackers the level of access or data they need. They will then attempt privilege escalation to gain more permissions or obtain access to the additional, more sensitive systems.
There are two types of privilege escalation:
Horizontal privilege escalation, an attacker expands their privileges by taking over another account and misusing the legitimate privileges granted to the other user. To learn more about horizontal privilege escalation see our guide on lateral movement.Vertical privilege escalation, an attacker attempts to gain more permissions or access with an existing account they have compromised. For example, an attacker takes over a regular user account on a network and attempts to gain administrative permissions or root access. This requires more sophistication and may take the shape of an Advanced Persistent Threat.
SSRF stands for server-side request forgery, are designed to exploit how a server processes external information. The primary purpose of the attack is to gain access to sensitive information/data. This could be performed directly (by forcing it to write data to an attacker-supplied URL) or indirectly (by allowing exploitation of a vulnerability that can be used to steal data).
Make sure you don't use algorithms that have known weaknesses such as MD5/SHA1, Data Encryption Standard (DES), and instead, use cryptographically strong APIs provided by your programming languages such as Advanced Encryption Standard (AES) (>=128 bits),and SHA256 (>=256-bits).
Latency is the time taken in milliseconds for delivering a single message or getting a response.
Throughput is the amount of data successfully transmitted through a system in a given amount of time. It is measured in bits per second.
Insecure design encompasses various risks that arise from ignoring design and architectural best practices, starting from the planning phase before actual implementation. A near-perfect implementation cannot prevent defects arising from an insecure design.
Threat modeling is a procedure for optimizing the applications, system, or business process security by identifying objectives and vulnerabilities, and then defining countermeasures to prevent or mitigate the effects of threats to the system.
There are many different threat modeling methodologies. Some of the most widely used are STRIDE, NIST 800-154, PASTA, and OCTAVE.
Microsoft developed the STRIDE methodology in the late 1990s as a way to standardize the identification of threats across their product line. It offers a mnemonic for identifying security threats in six categories:
- Spoofing: An intruder posing as another user, component, or other system feature that contains an identity in the modeled system.
- Tampering: The altering of data within a system to achieve a malicious goal.
- Repudiation: The ability of an intruder to deny that they performed some malicious activity, due to the absence of enough proof.
- Information Disclosure: Exposing protected data to a user that isn't authorized to see it.
- Denial of Service: An adversary uses illegitimate means to exhaust services needed to provide service to users.
- Elevation of Privilege: Allowing an intruder to execute commands and functions that they aren't allowed to.
Tokenization refers to a process by which a piece of sensitive data, such as a credit card number, is replaced by a surrogate value known as a token. It is the process of replacing sensitive data with unique identification symbols that retain all the essential information about the data without compromising its security.
In theoretical computer science, the PACELC theorem is an extension of the CAP theorem. It states that in the case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).
A Denial-of-Service (DoS) attack is an attack meant to shut down a machine or network, making it inaccessible to its intended users. DoS attacks accomplish this by flooding the target with traffic or sending it information that triggers a crash. In both instances, the DoS attack deprives legitimate users of the service or resource they expected.
A Finite State Machine, or FSM, is a computation model that can be used to simulate sequential logic, or, in other words, to represent and control execution flow. Finite State Machines can be used to model problems in many fields, including mathematics, artificial intelligence, games or linguistics.
Secure network address translation (SecureNA or SNAT) is a network address translation (NAT) technique that enables private network security by providing a public Internet Protocol (IP) address to remote users/systems.
GTM load balancer balances traffic for application servers across Data Centers. Global Traffic Manager is a load balancing solution that operates at the DNS level, directing traffic across multiple geographically dispersed data centers. Its primary goal is to optimize application availability and performance for users worldwide.
Local Traffic Manager focuses on load balancing within a single data center or location. It operates at the application layer, intelligently distributing traffic across multiple servers to ensure efficient resource utilization and optimal performance.
The real power of the LTM is it's a Full Proxy, allowing you to augment client and server side connections. All while making informed load balancing decisions on availability, performance, and persistence. Opposed to the GTM, traffic actually flows through the LTM to the servers it balances traffic to.