Head of NOC Engineer

Дата:16 Ноября
Работодатель:Frag Lab
Город:Київ (віддалено)
Квалификация:Degree in CS and/or equivalent experience.
Experience in an I.T. and/or NOC role.
Experience with AWS EKS, Kubernetes, Docker, PostreSQL, ELK stack
Possess a thorough understanding of High-load session based Game as Service architectural principles, operational needs and challenges
Must understand the principles of TCP/IP based networks.
Ability to work independently and possess superior skills in troubleshooting and issue resolution.
Strong sense of urgency with a passion for accuracy and timeliness.
Ability to work calmly in high pressure situations and manage multiple ongoing projects.
Excellent written and verbal communications skills and problem-solving skills.
Thorough understanding of monitoring and reporting tools.
Proven experience in a fast paced, time sensitive environment.
Self-motivated in learning and enhancing technical skills to increase job effectiveness.
As part of the 24?7 network surveillance team, this position requires participating in 9 and/or 12-hour rotational shifts as necessary.
Задачи:Provide technical guidance to NOC Engineers while supporting 24?7 rotational shifts.
Check application and system health to support NOC Engineers.
Day to day administration of Windows/Linux servers, including related applications.
Administer monitoring services in AWS such as K8S cluster, Elastic, Prometheus, Kafka, Kibana, Grafana, 3rd party services metrics (ClickHouse, Redis, PostgreSQL)
Record and respond to system events in accordance with established procedures.
Correlate multiple monitoring system events and application status to ensure proper diagnosis.
Troubleshoot and resolve system related issues.
Determine severity and urgency of an incident and take immediate action to restore service; escalating to Engineering staff as necessary.
Ensure established communications structure is followed during system impacting events.
Lead and direct troubleshooting efforts during incidents.
Ensure NOC Engineers can provide health status updates on production and development platforms.
Look for improvements and offer recommendations to existing process and documentation.
Protect players’ experience by applying initiative and sound judgment while adhering to established incident management tools.
Serve as escalation point to NOC staff to support 24?7x365 coverage efforts.
Perform control access management activities and conduct patch/remediation efforts.
Lead and perform other duties as assigned by your supervisor.