I am trying to download this file with ruby:
https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt
Here is a sample program, showing the issue (the comment shows what I get):
require 'uri'require 'net/http'require 'pp'url = 'https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt'uri = URI.parse(url)begin http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true http.start p http.get(uri.path) http.finishrescue => ex puts "error: #{ex.message} (#{ex.class})" # error: Failed to open TCP connection to evs.nci.nih.gov:443 (Blocking operation timed out!) (IO::TimeoutError)end
I tried to understand where the problem stands, and found it is due to this:
begin p TCPSocket.new("evs.nci.nih.gov", 443, connect_timeout: 90)rescue => ex puts "error: #{ex.message} (#{ex.class})" # error: Blocking operation timed out! (IO::TimeoutError)end
I tried to get more information like this:
pp Addrinfo.getaddrinfo("evs.nci.nih.gov", 443)# [#<Addrinfo: [2600:1f18:2102:3600:b4de:3448:7a8c:7ecd]:443 (evs.nci.nih.gov)>,# #<Addrinfo: [2600:1f18:2102:3601:4147:d992:65b4:4ac1]:443 (evs.nci.nih.gov)>,# #<Addrinfo: 52.1.83.88:443 (evs.nci.nih.gov)>,# #<Addrinfo: 3.224.99.192:443 (evs.nci.nih.gov)>]
Now this works:
p TCPSocket.new("3.224.99.192", 443, connect_timeout: 90)# #<TCPSocket:fd 3, AF_INET, 192.168.1.25, 2433>
Ok, then let's try that:
url = 'https://3.224.99.192/ftp1/CDISC/ReadMe.txt'uri = URI.parse(url)begin http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true http.start p http.get(uri.path) http.finishrescue => ex puts "error: #{ex.message} (#{ex.class})" # error: hostname "3.224.99.192" does not match the server certificate (OpenSSL::SSL::SSLError)end
Yeah, I can figure that out.
I tried curl:
curl -o "ReadMe.txt" --trace-ascii "curl.log" "https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt"
It works like a charm.Here is the log:
== Info: Trying [2600:1f18:2102:3601:4147:d992:65b4:4ac1]:443...== Info: Trying 3.224.99.192:443...== Info: Connected to evs.nci.nih.gov (3.224.99.192) port 443== Info: schannel: disabled automatic use of client certificate== Info: ALPN: curl offers http/1.1== Info: ALPN: server accepted http/1.1== Info: using HTTP/1.1=> Send header, 99 bytes (0x63)0000: GET /ftp1/CDISC/ReadMe.txt HTTP/1.10025: Host: evs.nci.nih.gov003c: User-Agent: curl/8.4.00054: Accept: */*0061: <= Recv header, 17 bytes (0x11)0000: HTTP/1.1 200 OK<= Recv header, 37 bytes (0x25)0000: Date: Sat, 20 Apr 2024 16:27:02 GMT<= Recv header, 26 bytes (0x1a)0000: Content-Type: text/plain<= Recv header, 22 bytes (0x16)0000: Content-Length: 2157<= Recv header, 24 bytes (0x18)0000: Connection: keep-alive<= Recv header, 30 bytes (0x1e)0000: Server: Apache/2.4.54 (Unix)<= Recv header, 46 bytes (0x2e)0000: Strict-Transport-Security: max-age=31536000;<= Recv header, 46 bytes (0x2e)0000: Last-Modified: Fri, 29 Mar 2024 04:31:09 GMT<= Recv header, 27 bytes (0x1b)0000: ETag: "86d-614c51be679a1"<= Recv header, 22 bytes (0x16)0000: Accept-Ranges: bytes<= Recv header, 2 bytes (0x2)0000: <= Recv data, 2157 bytes (0x86d)0000: March 29, 2024.(...)086b: SC== Info: Connection #0 to host evs.nci.nih.gov left intact
Any idea what goes wrong with ruby?
I found the reason: the issue is in Net::HTTP#connect
, where you find this code:
s = Timeout.timeout(@open_timeout, Net::OpenTimeout) { begin TCPSocket.open(conn_addr, conn_port, @local_host, @local_port) rescue => e raise e, "Failed to open TCP connection to " +"#{conn_addr}:#{conn_port} (#{e.message})" end}
Timeout.timeout
is notoriously flawed, and this code just does not work: even if you specify a timeout of 1 second, TCPSocket.open
will wait on my machine 21 seconds to end with Blocking operation timed out! (IO::TimeoutError)
.
If you patch the code with Socket.tcp
instead, it works:
begin s = Socket.tcp(conn_addr, conn_port, @local_host, @local_port, connect_timeout: @open_timeout) rescue => e raise e, "Failed to open TCP connection to #{conn_addr}:#{conn_port} (#{e.message})" end
Now if you look into the implementation of Socket.tcp
, you will see that it loops over the addresses returned by Addrinfo.getaddrinfo
, trying to connect to them one by one until finding an address that works. Since in my case, the IPv6 addresses are returned first, it has to fail 2 times before connecting.
The best solution is therefore for me to specify an IPv4 address:
url = 'https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt'uri = URI.parse(url)address = Addrinfo.getaddrinfo(uri.host, nil) .sort_by { |a| a.ipv4? ? 0 : 1 } .first.ip_addressNet::HTTP.start(uri.host, uri.port, use_ssl: true, ipaddr: address) do |http| # ...end
Hope this will help someone.
Cheers