System administrator Responsibilities
Sys Admin Responsibilities:
Sysadmins are responsible for a wide range of duties, but these are the most essential.
System administrators are critical to the reliable and successful operation of a company and its network operations center and data center. Sysadmin must be proficient with the system's underlying platform (ie, Windows, Linux) as well as networking, backup, data recovery, IT security, database operations, middleware basics, load balancing, and more.
Sysadmins tasks are not limited to server maintenance, maintenance and repairs, but also any functions that support a smooth running product environment with minimal (or not) complaints from customers and end users. Although there is an endless list of responsibilities for sysadmins, some are more complex than others. If you are working in the role of Sysadmin (or hope one day), make sure you are ready to follow these best practices.
Duties of a system administrator
The duties of a system administrator are wide-ranging, and vary widely from one organization to another. Sysadmins are usually charged with installing, supporting, and maintaining servers or other computer systems, and planning for and responding to service outages and other problems. Other duties may include scripting or light programming, project management for systems-related projects.
The system administrator is responsible for following things:
- ✔User administration (setup and maintaining account)
- ✔Maintaining system
- ✔Verify that peripherals are working properly
- ✔Quickly arrange repair for hardware in occasion of hardware failure
- ✔Monitor system performance
- ✔Create file systems
- ✔Install software
- ✔Create a backup and recovery policy
- ✔Monitor network communication
- ✔Update system as soon as new version of OS and application software comes out
- ✔Implement the policies for the use of the computer system and network
- ✔Setup security policies for users. A Sysadmin must have a strong grasp of computer security (e.g. firewalls and intrusion detection systems)
- ✔Documentation in form of internal wiki
- ✔Password and identity management
Cloud computing and Sysadmin
Cloud computing is nothing but a large number of computers connected through the Internet/Wan. Cloud computing is now part of technology and Sysadmin must lean:
1. Automation software such as Ansible, Puppet, Chef, etc.
2. Cloud infrastructure such as AWS, Openstack etc.
3. Network services in cloud such as Content delivery networks (Akamai, CloudFront etc.) and DNS servers.
4. Source control
5. Designing best practices for backups, and whole infrastructure.
System administrator account?
The root account has full (unrestricted) access, so he/she can do anything with system. For example, root can remove critical system files. In addition, there is no way you can recover file except using tape backup or disk based backup systems.
Many tasks for system administration can be automated using Perl/Python or shell scripts. For example:
Create new users.
Resetting user passwords.
Lock/unlock user accounts.
Monitor server security.
Monitor special services etc
Most important skill to a system administrator
Problem solving, period. This can some time lead into all sorts of constraints and stress. When workstation or server goes down, you are called to solve the problem. You should able to quickly and correctly diagnose the problem. You must figure out what is wrong and how best it can be fixed in small amount of time.
System administrators are notů
Cookie cutting software engineers.
It is not usually within your duties to design new applications software.
But, you must understand the behavior of software in order to deploy it and to troubleshoot problems, and generally should be good at several programming languages used for scripting or automation of routine tasks such as shell, awk, perl, python etc.
Documentation is how sysadmins keep records of assets, including hardware and software types, counts, and licenses. Should there be any issues in the production environment, documentation helps identify the hardware, virtual machine, appliance, software, etc., that may be involved.
Maintain lists of all your physical and virtual servers with the following details:
OS: Linux or Windows, hypervisor with versions
RAM: DIMM slots in physical servers
CPU: Logical and virtual CPUs
HDD: Type and size of hard disks
External storage (SAN/NAS): Make and model of storage with management IP address and interface IP address
Open ports: Ports opened at the server end for incoming traffic
IP address: Management and interface IP address with VLANs
Engineering appliances: e.g., Exalogic, PureApp, etc.
Configured applications: e.g., Oracle WebLogic, IBM WebSphere Application Server, Apache Tomcat, Red Hat JBoss, etc.
Third-party software: Any software not shipped with the installed OS
Maintain license counts and details for physical servers and virtual servers (VMs), including licenses for Windows, subscriptions for Linux OS, and the license limit of hypervisor host.
Server health check-up
Running processes: Check for processes that are consuming more resources than expected, and take action to fine-tune the applications (with the help of the application team).
CPU utilization: Consistently monitor and check the CPU utilization of the critical process like "java", "http", "mysql" etc. to ensure that these are not consuming the CPU resources more than expected. If it is so, then coordinate with the application team to check it at application level and fine tune the same. Parallely analyse the OS parameters like "Ulimits".
Memory utilization: Check memory utilization and clear the cache, if required.
Zombie processes: Check for processes where the PID still exists in the process table after it is terminated. Zombie processes degrade server performance, so find and kill any that exist.
Load average: If you're having performance issues, check the load average and tune the server for performance.
Disk/SAN/NAS utilization: Check the I/O reports for externally attached storage to track and check the speed of read/write operations. If you find any issues, coordinate with the storage and network teams immediately to correct them.
Backup and disaster recovery planning
Communicate with the backup team and provide them the data and client priorities for backup. The recommended backup criteria for production servers is:
Incremental backups: Daily, Monday to Friday
Full back up: Saturday and Sunday
Disaster recovery drills: Perform restoration mock drills once a month (preferably, or quarterly if necessary) with the backup team to ensure the data can be restored in case of an issue.
Operating system patches for known vulnerabilities must be implemented promptly. There are many types and levels of patches, including:
When a patch is released, check the bug or vulnerability details to see how it applies to your system (e.g., does the vulnerability affect the hardware in your system?), and take any necessary actions to apply the patches when required. Make sure to cross-verify applications' compatibility with patches or upgrades.
Before going live with any application, check its compatibility with your hardware and operating system, and make sure to do load testing (with the support of application team).
Set a BIOS password: This prevents users from altering BIOS settings.
Set a GRUB password: This stops users from altering the GRUB bootloader.
Deny root access: Rejecting root access minimizes the probability of intrusions.
Sudo users: Make sudo users and assign limited privileges to invoke commands.
TCP wrappers: This is the weapon to protect a server from hackers. Apply a rule for the SSH daemon to allow only trusted hosts to access the server, and deny all others. Apply similar rules for other services like FTP, SSH File Transfer Protocol, etc.
Firewalld/iptables: Configure firewalld and iptables rules for incoming traffic to the server. Include the particular port, source IP, and destination IP and allow, reject, deny ICMP requests, etc. for the public zone and private zone.
Antivirus: Install antivirus software and update virus definitions regularly.
Secure and audit logs: Check the logs regularly and when required.
Rotate the logs: Keep the logs for limited period of time like "for 7 days", to keep the sufficient disk space for flawless operation.
Set a BIOS password: This prevents users from altering BIOS settings.
Antivirus: Install antivirus software and update virus definitions regularly. ✔
Configure firewall rules: Prevent unauthorized parties from accessing your systems. ✔
Deny administrator login: Limit users' ability to make changes that could increase your systems' vulnerabilities.
Use a syslog server
By configuring a syslog server in the environment to keep records of system and application logs, in the event of an intrusion or issue, the Sysadmin can check previous and real-time logs to diagnose and resolve the problem.
Many Sysadmin tasks (such as server health check-ups, resource utilization, backup triggers, transfer files and logs, etc.) must be done at specific times. Therefore, the Sysadmin must write scripts or use external tools and configure them as cron jobs to do the tasks automatically at the proper time.
Install and configure live monitoring tools like Nagios, HP, etc., to monitor your IT infrastructure and issue alerts about potential problems.
While these are the most important tasks that Sysadmin is responsible for, there is much more to the role than the tasks on this list.
For example, Sysadmin must coordinate with multiple teams to resolve issues, communicate and update customers, maintain 100% time management, negotiate with the audit team, prepare weekly / monthly / quarterly reports, and monitor servers and services using appropriate tools. , And manage the hardware console and respond to alarms triggered.
Sysadmins is always a single point (SPOC) in a data center or network operations center for issues related to web hosting, application and server failures, and other critical IT operation issues.
The DevOps seminar will help you to learn DevOps from scracth to deep knowledge of various DevOps tools such as fallowing List.