I rarely blog about purely technical errors, but this specific message from yarn is something I’ve seen a number of people struggling with. I’m going to explain a bit more about why it comes about, and how I solved it in my situation. This will not work for everyone, but it may give you a hint.

So, the failure mode I’m talking about looks like this:

$ yarn install
yarn install v1.21.1
info No lockfile found.
[1/4] Resolving packages...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...

This bit at the end is crucial: “There appears to be trouble”. Lots of people see this and experience the problem. Why? How is this so difficult to resolve?

Here’s the thing: this particular error message is what I tend to call “symptomatic”. I could liken it to a rash on the arm: there are lots of things that can cause such a problem, and the rash itself doesn’t tell you much. Sadly, while a doctor can look closely at a rash and get some clues to the specific reason, the text of our error message will reveal nothing further, no matter how closely we examine it.

Let’s take a look at some other people with this problem:

Interestingly, the last post at least offers some pointers. Here are some things that people have tried and apparently might work:

  • removing the yarn.lock file
  • setting a very lock timeout with the --network-timeout 100000 option
  • looking at any proxy settings, e.g. with printenv | grep proxy
  • removing any npm proxy settings with npm config rm https-proxy
  • checking settings in ~/.yarnrc

Most worryingly, many people also experience these things:

  • npm appears to work while yarn doesn’t
  • the problem is sometimes there, and sometimes goes away
  • one of the tricks above seems to fix it, at least for a short period of time

The real problem?

Here’s the thing: none of the above things are necessarily wrong, they’re just likely not right. If you have a very slow network, setting a longer timeout may help. But for most it probably won’t. If you did somehow configure a proxy for a work setting and forgot to disable it, then surely removing that setting will help - but again, this won’t apply to many.

Unfortunately this is a generic problem. Any potential network error could cause this, so really what we’re saying here is there’s a lot of potential diagnosis. It could also be a firewall causing problems, or some kind of transparent proxying on a corporate network. Indeed some of the error reports co-incide with times when the npm service was offline.

A potential solution

There is a potential solution here which no-one else appears to have documented. But before I get to that, let’s talk about how you go about debugging a network problem.

Step one: obviously, make sure the connectivity works in general. Can I browse to my usual websites? Sometimes people won’t even bother checking this, but if various pages are failing randomly then you have a problem elsewhere.

Step two: check the DNS works. Sadly yarn doesn’t give great output unless we ask it to be verbose:

$ yarn install --verbose
yarn install v1.21.1
verbose 0.854 Checking for configuration file "/home/a/project/.npmrc".
[[...snip...]]
verbose 0.982 current time: 2020-02-05T21:03:08.762Z
info No lockfile found.
[1/4] Resolving packages...
verbose 1.198 Performing "GET" request to "https://registry.yarnpkg.com/axios".

I can see it’s trying to get axios, but I should check the DNS name resolves first:

$ host registry.yarnpkg.com
registry.yarnpkg.com is an alias for yarn.npmjs.org.
yarn.npmjs.org has address 104.16.19.35
yarn.npmjs.org has address 104.16.20.35
yarn.npmjs.org has address 104.16.17.35
[[...snip...]]

There are lots of potential addresses here. In fact, it’s quite possible some of these don’t work: so if you get the problem randomly but not very often, that could well just be “normal”!

Step three: let’s check we can get to one of these IP addresses at least. We can use traceroute, tracepath or potentially even ping for this:

$ ping -c 2 104.16.19.35
PING 104.16.19.35 (104.16.19.35) 56(84) bytes of data.
64 bytes from 104.16.19.35: icmp_seq=1 ttl=56 time=31.7 ms
64 bytes from 104.16.19.35: icmp_seq=2 ttl=56 time=29.2 ms

Unfortunately, for me, tracepath appears to fail from most hosts. These checks are useful because if they work you know the connectivity is ok: but if they fail, it doesn’t necessarily mean there is a problem.

Step four: we can try to download a package directly with curl:

$ curl -o test.tgz -v https://registry.yarnpkg.com/vue-axios/-/vue-axios-2.1.4.tgz

If this step works, then you can be pretty confident that the problem lies with the npm/yarn configuration.

Step five: the dreaded IPv6.

The yarn/npm records have entries for IPv6 addresses. Unfortunately, while there’s nothing wrong with this, it will cause issues on some networks that are misconfigured. It should be the case that you’d see problems before now - e.g. doing a route test - but some of those checks can fail in a “normal” network too.

I hate suggesting this, but it can be worth disabling IPv6 on a temporary basis. Sometimes that will kick things back into action. Obviously, if that is the problem, you then need to decide whether to turn it off permanently. It’s unlikely you’ll miss much.

What was wrong for me?

For me, step two fails: DNS doesn’t resolve.

This actually isn’t quite true: sometimes it works, sometimes it doesn’t. But there’s a consistent reason for it not working:

$ host registry.yarnpkg.com | head -n 2
registry.yarnpkg.com is an alias for yarn.npmjs.org.
yarn.npmjs.org has address 104.16.17.35
$ host -T registry.yarnpkg.com | head -n 2
;; connection timed out; no servers could be reached

Oh dear. One command works, the other doesn’t. The first command does DNS resolution the “usual” way: via a query sent via UDP. The second one does it in an odd way, via TCP. This query doesn’t work because my local DNS server (not the one used by yarn) is broken: either a firewall is blocking access, or it’s just misconfigured.

99% of the time, this misconfiguration doesn’t cause a problem. The default mechanism, UDP, works perfectly fine. However, DNS queries will switch to TCP when the response becomes too large for UDP. I’ve confirmed that’s happening here with strace:

connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("185.4.116.166")}, 16) = 0
sendmmsg(3, [{msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\250i\1\0\0\1\0\0\0\0\0\0\10registry\7yarnpkg\3co"..., iov_len=38}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_EOR|MSG_WAITALL|MSG_SYN|MSG_ERRQUEUE|MSG_NOSIGNAL|MSG_MORE|MSG_CMSG_CLOEXEC|0x1b7a0000}, msg_len=38}, {msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\237\372\1\0\0\1\0\0\0\0\0\0\10registry\7yarnpkg\3co"..., iov_len=38}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=38}], 2, MSG_NOSIGNAL) = 2
recvfrom(3, "\250i\201\200\0\1\0\r\0\6\0\5\10registry\7yarnpkg\3co"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("185.4.116.166")}, [28->16]) = 500
recvfrom(3, "\237\372\203\200\0\1\0\r\0\4\0\0\10registry\7yarnpkg\3co"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("185.4.116.166")}, [28->16]) = 506
close(3)                                = 0
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("185.4.116.166")}, 16) = -1 EAGAIN (Resource temporarily unavailable)
close(3)

I’ve removed some lines that don’t matter. What you see here is a network connection being made, first by UDP, to port 53 on the DNS server for my network. It does the query, receives a response but the response is only partial. So annother connection is made, this time using TCP (SOCK_STREAM), to the same server and the same port. This connection fails.

As I said, this is quite a common problem: especially on hotel or coffeeshop wifi. The network is fine, 99% of the time. But on odd occasions it fails.

A solution

So, I’ve outlined some steps to debug the connection. Your problem might be different to mine, so you need to work through some of those steps and see where you fail. Certainly, if you have a network problem, there’s not much you can do with npm/yarn.

However, if you have my problem, it’s not obvious to resolve. I can’t change the DNS server on the network I’m using. This problem is triggered by npm/yarn DNS having so many records, so the results for queries are large, and therefore they trigger this problem.

The easiest thing I think to do is stick in a manual /etc/hosts entry for now. I do this on Linux with a simple one-liner:

$ D=registry.yarnpkg.com sh -c 'echo `dig +short $D | tail -1` $D' | sudo tee -a /etc/hosts

If you’re on Windows or something, you’ll need to figure out the alternative.

This isn’t perfect, because that IP address we’re putting in there won’t work forever. However, it was enough to get me out of my jam.