I am trying to download this file with ruby:
https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt
Here is a sample program, showing the issue (the comment shows what I get):
require 'uri'require 'net/http'require 'pp'url = 'https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt'uri = URI.parse(url)begin http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true http.start p http.get(uri.path) http.finishrescue => ex puts "error: #{ex.message} (#{ex.class})" # error: Failed to open TCP connection to evs.nci.nih.gov:443 (Blocking operation timed out!) (IO::TimeoutError)endI tried to understand where the problem stands, and found it is due to this:
begin p TCPSocket.new("evs.nci.nih.gov", 443, connect_timeout: 90)rescue => ex puts "error: #{ex.message} (#{ex.class})" # error: Blocking operation timed out! (IO::TimeoutError)endI tried to get more information like this:
pp Addrinfo.getaddrinfo("evs.nci.nih.gov", 443)# [#<Addrinfo: [2600:1f18:2102:3600:b4de:3448:7a8c:7ecd]:443 (evs.nci.nih.gov)>,# #<Addrinfo: [2600:1f18:2102:3601:4147:d992:65b4:4ac1]:443 (evs.nci.nih.gov)>,# #<Addrinfo: 52.1.83.88:443 (evs.nci.nih.gov)>,# #<Addrinfo: 3.224.99.192:443 (evs.nci.nih.gov)>]Now this works:
p TCPSocket.new("3.224.99.192", 443, connect_timeout: 90)# #<TCPSocket:fd 3, AF_INET, 192.168.1.25, 2433>Ok, then let's try that:
url = 'https://3.224.99.192/ftp1/CDISC/ReadMe.txt'uri = URI.parse(url)begin http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true http.start p http.get(uri.path) http.finishrescue => ex puts "error: #{ex.message} (#{ex.class})" # error: hostname "3.224.99.192" does not match the server certificate (OpenSSL::SSL::SSLError)endYeah, I can figure that out.
I tried curl:
curl -o "ReadMe.txt" --trace-ascii "curl.log" "https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt"It works like a charm.Here is the log:
== Info: Trying [2600:1f18:2102:3601:4147:d992:65b4:4ac1]:443...== Info: Trying 3.224.99.192:443...== Info: Connected to evs.nci.nih.gov (3.224.99.192) port 443== Info: schannel: disabled automatic use of client certificate== Info: ALPN: curl offers http/1.1== Info: ALPN: server accepted http/1.1== Info: using HTTP/1.1=> Send header, 99 bytes (0x63)0000: GET /ftp1/CDISC/ReadMe.txt HTTP/1.10025: Host: evs.nci.nih.gov003c: User-Agent: curl/8.4.00054: Accept: */*0061: <= Recv header, 17 bytes (0x11)0000: HTTP/1.1 200 OK<= Recv header, 37 bytes (0x25)0000: Date: Sat, 20 Apr 2024 16:27:02 GMT<= Recv header, 26 bytes (0x1a)0000: Content-Type: text/plain<= Recv header, 22 bytes (0x16)0000: Content-Length: 2157<= Recv header, 24 bytes (0x18)0000: Connection: keep-alive<= Recv header, 30 bytes (0x1e)0000: Server: Apache/2.4.54 (Unix)<= Recv header, 46 bytes (0x2e)0000: Strict-Transport-Security: max-age=31536000;<= Recv header, 46 bytes (0x2e)0000: Last-Modified: Fri, 29 Mar 2024 04:31:09 GMT<= Recv header, 27 bytes (0x1b)0000: ETag: "86d-614c51be679a1"<= Recv header, 22 bytes (0x16)0000: Accept-Ranges: bytes<= Recv header, 2 bytes (0x2)0000: <= Recv data, 2157 bytes (0x86d)0000: March 29, 2024.(...)086b: SC== Info: Connection #0 to host evs.nci.nih.gov left intactAny idea what goes wrong with ruby?
I found the reason: the issue is in Net::HTTP#connect, where you find this code:
s = Timeout.timeout(@open_timeout, Net::OpenTimeout) { begin TCPSocket.open(conn_addr, conn_port, @local_host, @local_port) rescue => e raise e, "Failed to open TCP connection to " +"#{conn_addr}:#{conn_port} (#{e.message})" end}Timeout.timeout is notoriously flawed, and this code just does not work: even if you specify a timeout of 1 second, TCPSocket.open will wait on my machine 21 seconds to end with Blocking operation timed out! (IO::TimeoutError).
If you patch the code with Socket.tcp instead, it works:
begin s = Socket.tcp(conn_addr, conn_port, @local_host, @local_port, connect_timeout: @open_timeout) rescue => e raise e, "Failed to open TCP connection to #{conn_addr}:#{conn_port} (#{e.message})" endNow if you look into the implementation of Socket.tcp, you will see that it loops over the addresses returned by Addrinfo.getaddrinfo, trying to connect to them one by one until finding an address that works. Since in my case, the IPv6 addresses are returned first, it has to fail 2 times before connecting.
The best solution is therefore for me to specify an IPv4 address:
url = 'https://evs.nci.nih.gov/ftp1/CDISC/ReadMe.txt'uri = URI.parse(url)address = Addrinfo.getaddrinfo(uri.host, nil) .sort_by { |a| a.ipv4? ? 0 : 1 } .first.ip_addressNet::HTTP.start(uri.host, uri.port, use_ssl: true, ipaddr: address) do |http| # ...endHope this will help someone.
Cheers