职位详情
五险一金
年终奖金
股票期权
带薪年假
子女福利
弹性工作
领导好
岗位晋升
Position Objective:
This position is responsible for the construction of the lead operation platform build up, include operation automated, define standard procedure in SRE area, being familiar with the cloud platform architecture and container(AKS or K8S), including the design, deployment and optimization of IT infrastructure, application components, and business full-link monitoring & analysis. Assist in the planning, design and implementation of DevOps tools.
Roles and Responsibilities:
1, Responsible for service monitoring, incident response, and capacity planning of the cloud platform, ensuring platform stability and high availability.
2, Develop and promote deployment and operation standards for business systems, output SRE related standards and guidelines, review and improve the architecture in business applications.
3, Develop and maintain an automated operation and maintenance platform, continuously promoting operation and delivery automation.
4, Explore and research various new technologies such as cloud services, containerization, and intelligent operation and maintenance.
5, Through continuous and comprehensive data operations (including availability indicators, historical accidents, resource utilization, etc.), identify weak points in the system and improvement projects gaps.
6, Emergency response and incident management, intervene in the analysis and location of incident as soon as they occur, accurately locate the relevant team for incident restoration, and organize and optimize follow-up action.
Minimum Job Requirements:
1, Bachelor's degree or above, major in computer science, with at least 8 years of experience in operation, maintenance, or development.
2, Master at least two development languages, including Shell, Python, Go, and Java, and have relevant platform development experience. Able to undertake platform design, project planning and tasks.
3, Proficient in the Linux operating system, able to mine system issues and provide suggestions based on actual usage.
4, Experience with Cloud (Azure, AWS, etc.) architecture solution, hands-on project and integration case with Cloud Computing, Cloud Migration, virtualization layer, and other backend Infrastructure Services.
5, Have a good understanding of network fundamentals and protocols such as TCP/IP, HTTP, HTTPS, etc.
6, Familiar with the use and maintenance of various cloud services and middleware, such as CDN, SLB, WAF, SQL Server, MySQL, Redis, OSS, etc.;
7, Familiar with containerization technology and has extensive experience in using K8S.
8, Proficient in industry open-source monitoring products such as Dynatrace, Zabbix, Prometheus, etc.; Capable of designing, researching, and implementing strategies for anomaly detection, root cause analysis, fault self-healing, and alarm convergence; By monitoring the system, identify and solve technical challenges such as system performance bottlenecks.
9, Good learning and communication skills, strong sense of responsibility, good team spirit and pressure resistance, curious about new technologies, and is willing to share.
10, Fluent English communication skills are essential.
其他信息
语言要求:英语、粤语
行业要求:全部行业