Technical Language Goes Human

23 Oct, 2012 Tips and Tricks 0 Comments 2 Votes

Soft reboot

Soft reboot is a way of rebooting the server gracefully, via a console command. If the server is rebooted in this way, the disk partitions get unmounted cleanly, which means that during the boot up the server is not likely to require File System Check to be performed (there’s less probability that there will be errors in the file system). The rebuild of RAID array may also be skipped (not always, though).

Hard reboot (power reset)

Hard reboot is a way of rebooting the server forcibly, by means of power reset (needs a request to the Datacenter to be submitted). In case of hard reboot, disk partitions do not get unmounted cleanly, so the File_System_Check is a mandatory procedure. RAID array also requires execution without fail.

File System Check (FSCK)

File System Check (FSCK) is a procedure of checking file system consistency. In most cases we run FSCK after an improper shut-down/reboot of the server. This system tool scans file systems for critical errors and fixes them.

Why does it take so long?

First of all, it takes time due to the size of the HDD on the server, the number of disk partitions and the amount of file system errors. In order to provide better fail safety and optimize the maintenance, the disk is always partitioned so that the most essential partitions could be mounted/unmounted separately. Also, as our servers are cPanel based, and this control panel has its own peculiarities of data processing and disk usage, we separate partitions in the following way:

/
/boot
/var
/usr
/tmp
/home

/home is the one to take most time for the check, as it’s the biggest and the most crucial one (users’ folders with all their content are located there).

Another reason is that FSCK can hardly be run in the background. The partition needs to be unmounted, so that the errors could be fixed properly. That is why we run FSCK before any services are started.

Why don’t you always provide clear updates on the FSCK status?

Unlike the rebuild process, FSCK does not provide percentage of the progress, so the only thing we can report on is the current partition under the check and the phase of the FSCK. Furthermore, FSCK is a non-linear process, i.e. the length of time for the check varies depending on the number of errors on certain partitions.

FSCK Phases are:

Phase 1: Check Blocks and Sizes
Phase 2: Check Path-Names
Phase 3: Check Connectivity
Phase 4: Check Reference Counts
Phase 5: Check Cylinder Groups
Phase 6: Salvage Cylinder Groups

Why do you sometimes perform the manual FSCK?

Manual FSCK is usually performed when automatic FSCK fails due to the file system errors which were skipped or not fixed and then caused the failure.

HDD

HDD (hard disk drive) is a device that stores the digitally encoded data on rapidly rotating platters with magnetic surfaces. HDDs record data by magnetizing ferromagnetic material directionally, to represent either a 0 or a 1 binary digit. They read the data back by detecting the magnetization of the material.

The HDD is crucially important for the server, as it contains user data and the OS installed. Therefore, the majority of our servers usually has 3 HDDs for better fail safety – the leading one, the mirrored one (the HDD, which synchronizes all data from the 1st HDD, being set in RAID array) and the backup HDD for the most outstanding cases.

RAID rebuild

RAID (redundant array of independent disks) is a technology that allows the server to achieve high levels of storage reliability. The technology we use is RAID1 (mirror), it provides both fail safety and secures us from the data loss (as the information from one HDD is fully copied to the 2nd one) and better performance (as the files are read from 2 HDDs, the subsystem of a single hard disk doesn’t get overloaded).

RAID rebuild is a process of data synchronization between 2 disks added to the array. It is usually required after the reboot (especially hard), as the data on physical units becomes out of sync.

In order to save our customers’ time, we usually rebuild RAID in the background. Due to the HDD system resource intensity of this process, some of the services may experience slowdown, due to the fact that the simultaneous reading requests are sent to a single leading HDD. Also we may temporarily stop some services manually in order to speed up the rebuild process, lessen the load on the HDD system and provide uninterrupted availability of the critical services (Apache, PHP, MySQL).

Downtime

Downtime is the complete unavailability of the entire server and all its services. However, problems with functionality of one or several services, experienced by one or several accounts on the server can be sometimes considered as server being down, while there can be problems with some of the protocols accessibility (HTTP, FTP, SMTP, IMAP, POP3), so it is always advisable to do some quick checks first:

check the correspondent forum thread dedicated to your server regarding the recent outages;
check the availability of the server by pinging via the command line (execute ping domain_name).

Kernel

Kernel is the central component of computer operating systems. As it contains the set of needed drivers and modules, it can be considered as the bridge between applications and the actual data processing done on the hardware level. Kernel is also responsible for managing the system resources (communication between hardware and software components).

Kernel Panic

Kernel panic is an action taken by the operating system upon detecting an internal fatal error. It may occur either as a result of a hardware failure or due to a bug in the operating system. Depending on the issue, which caused the kernel panic, there are correspondent procedures to be exercised.

DDOS

DDOS (distributed denial-of-service attack) is an attempt to make a computer resource unavailable to its intended users. Although the means to carry out, motives for, and targets of a DDoS attack may vary, it generally consists of the concerted efforts of a person or people to prevent an Internet site or service from functioning efficiently or at all, temporarily or indefinitely. Perpetrators of DDoS attacks typically target sites or services hosted on high-profile web servers such as banks, credit card payment gateways, and even root nameservers.

The most popular way of the DDoS experienced by hosting provider is a targeted ICMP flood sent to the server, rack, or the router.

DDoS attacks we’ve suffered were usually caused by the targeted attacks on some of our clients’ websites. As those are hosted on shared servers it usually led to the temporary unavailability of the entire server. However, some type of attacks may affect a group of servers (even if a single server is under attack), this usually happens to the servers, which are close by their network topography to the target server(s).

Since during the DDoS attack we take such measures as blocking out IPs establishing too many connections, some innocent site visitors may have their IPs temporarily blocked.

Our technique of fighting the DDoS attacks has been improved lately. We are also carrying out multiple preventive measures, such as proactive network monitoring and hardware/software filtering.

Software vulnerability

Software vulnerability is a security gap or a bug, which may be used by hackers for gaining control over some account or the entire server for sending SPAM, spreading viruses or stealing some important information. Though we monitor all our servers severely and always perform security audits, we can hardly control all our clients’ software. That is why we rely on our customers’ consciousness, as it is of their responsibility to update their software on time.

Aside from the vulnerability of software and its modules, there is another side of the coin: some software delegates non-standard permissions to the files and folders it’s using, making them vulnerable as well. That’s why it is always advisable to check those files (especially .htaccess) and set correct permissions to them (644 for files and 755 for folders).

Subnet

A subnetwork, or subnet, is a logically visible, distinctly addressed part of a single Internet Protocol network. The process of subnetting is the division of a computer network into groups of computers that have a common, designated IP address routing prefix.

Subnetting breaks a network into smaller realms that may use existing address space more efficiently, and, when physically separated, may prevent excessive rates of Ethernet packet collision in a larger network.

In this regard the subnets are divided into classes (A,B,C,D):

75.127.76.120
 A  B  C   D

Sometimes our customers wish to have dedicated IPs of different C-class subnet (e.g. 75.127.76.120 and 75.127.92.58). Unfortunately, this cannot always be realized on a shared server, as the subnet assigned to it is of the same C-class.

Routers are used to interchange traffic between subnetworks and constitute logical or physical borders between the subnets. They manage traffic between subnets based on the high-order bit sequence (routing prefix) of the addresses.

Router

Router is a networking device whose software and hardware are usually tailored to the tasks of routing and forwarding information. Routers connect two or more logical subnets, letting the servers under those subnets communicate with each other and connect to the Internet at the same time.

Background process

Background process is a process run on the server simultaneously with the major server services. Actually, the majority of initial server’s services (‘parents’) are performed in the background while being idle. Once some service receives a request, it fires up a ‘child’ process, which becomes active during the period of execution.

Due to the intensity of such major processes we clearly state in our Acceptable Use Policy, that running stand-alone, unattached server side processes/deamons is strictly prohibited.

There are also maintaining background processes, such as backing-up, RAID rebuild and FSCK, which we start in case of necessity. Though they may temporarily slow down the entire performance of the server, they save a bunch of time in comparison with those processes being run, when no vital services (http, mysql, php) are started.

Server rack

Server rack is a standardized frame or enclosure for mounting multiple equipment modules. Physically it allows setting several servers at one place behind one router, logically – it eases the monitoring and network administration as the servers in the rack are grouped by the common factor (network topology).