一个Projeted Volume挂载service account引发的故障

最近接到一个业务报告的故障，提到argo创建的流水线运行失败，其中Pod去访问kube API Server时，偶发出现401认证不通过的问题。

API Server报错如下：

E1212 18:37:53.063390       1 claims.go:126] unexpected validation error: *errors.errorString
E1212 18:37:53.063583       1 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, Token could not be validated.]"

从报错上来看，的确没有通过API Server的认证。

func (v *validator) Validate(ctx context.Context, _ string, public *jwt.Claims, privateObj interface{}) (*apiserverserviceaccount.ServiceAccountInfo, error) {
	private, ok := privateObj.(*privateClaims)
	if !ok {
		klog.Errorf("jwt validator expected private claim of type *privateClaims but got: %T", privateObj)
		return nil, errors.New("Token could not be validated.")
	}
	nowTime := now()
	err := public.Validate(jwt.Expected{
		Time: nowTime,
	})
	switch {
	case err == nil:
	case err == jwt.ErrExpired:
		return nil, errors.New("Token has expired.")
	default:
		klog.Errorf("unexpected validation error: %T", err)
		return nil, errors.New("Token could not be validated.")
	}

先不看API Server的报错，看看业务使用的认证方式是什么。

通常业务使用的是service account来访问API Server，这种情况业务代码使用incluster kubeconfig就可以通过API Server的认证了；如果需要相应的权限，则给这个service account授予对应的RBAC权限即可。

这种用法的一个弊端是一旦授予出去了，service account的有效期就是永久的，回收很困难。

projected volumes提供了一种新的service account注入的方法： https://kubernetes.io/docs/concepts/storage/projected-volumes/#serviceaccounttoken

如下是一个示例：

apiVersion: v1
kind: Pod
metadata:
  name: sa-token-test
spec:
  containers:
  - name: container-test
    image: busybox:1.28
    command: ["sleep", "3600"]
    volumeMounts:
    - name: token-vol
      mountPath: "/service-account"
      readOnly: true
  serviceAccountName: default
  volumes:
  - name: token-vol
    projected:
      sources:
      - serviceAccountToken:
          audience: api
          expirationSeconds: 3600
          path: token

kubelet会为service account临时生成一个token，并将token注入到/service-account，token有效期为3600秒。你可以将token取出来，写到kubectl使用的config中，同样可以访问API Server。

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: 「CA」
    server: https://kubernetes-apiserver.kube-system.svc.global.tcs.internal:6443
  name: global
contexts:
- context:
    cluster: global
    user: xxx
  name: xxx@global
current-context: xxx@global
kind: Config
preferences: {}
users:
- name: xxx
  user:
    token: 「token」

回到这个问题，既然怀疑这个token有问题，于是我们将Pod的启动命令改成 sleep infinity，等Pod启动后，将token拿出来放到config中，发现token没有任何问题，而且过期时间也足够。

陷入了沉思。

不过没关系，我们回过来看上面API Server的报错。我们使用的是k8s 1.22版本，只有ErrExpired会打印具体的报错，其他的错误只会傻傻的打印 *errors.errorString 。

func (c Claims) ValidateWithLeeway(e Expected, leeway time.Duration) error {
	if e.Issuer != "" && e.Issuer != c.Issuer {
		return ErrInvalidIssuer
	}

	if e.Subject != "" && e.Subject != c.Subject {
		return ErrInvalidSubject
	}

	if e.ID != "" && e.ID != c.ID {
		return ErrInvalidID
	}

	if len(e.Audience) != 0 {
		for _, v := range e.Audience {
			if !c.Audience.Contains(v) {
				return ErrInvalidAudience
			}
		}
	}

	if !e.Time.IsZero() && e.Time.Add(leeway).Before(c.NotBefore.Time()) {
		return ErrNotValidYet
	}

	if !e.Time.IsZero() && e.Time.Add(-leeway).After(c.Expiry.Time()) {
		return ErrExpired
	}

	return nil
}

从上面的代码来看，返回的错误情况有很多，但是最大的嫌疑，就是 ErrNotValidYet 。于是检查了各个节点的时间，发现果然时钟不同步，有一台服务器的时钟慢了2分钟。

那么是为什么呢？

原因很简单。证书签发后，如果是发给时钟慢的节点，会被API Server认为证书签发在未来的一个时间，所以还没生效，认证失败；为什么取出来token就好了呢？因为这个动作是人来做的，等把token取出来编辑好 kubeconfig，已经过去了2分钟，证书也就生效了。

btw，token是一个jwt，可以到 jwt.io 上检查，可视化做的非常好。

Ref：

Projected Volumes