Investigating TLS blocking in India
Simone Basso (OONI), Gurshabad Grover and Kushagra Singh (Centre for Internet and Society, India)
2020-07-08
This report investigates Transport Layer Security
(TLS)-based
blocking in India. Previous
research
by the Centre for Internet &
Society, India (CIS) has already
exposed TLS blocking based on the value of the SNI
field.
OONI has also implemented and started
testing
SNI-based TLS blocking measurements.
Recently, the Magma Project
documented
cases where CIS India and OONI’s methodologies could be improved. They
specifically found that blocking sometimes appears to depend not only on
the value of the SNI field but also on the address of the web server
being used. These findings were later confirmed by OONI measurements in
Spain
and Iran through
the use of an extended measurement methodology.
We were therefore curious to see whether such an extended methodology
would discover further cases of TLS blocking in India. To answer this
research question we ran experiments on the networks of three popular
Indian Internet Service Providers (ISPs) (ACT
Fibernet, Bharti
Airtel, and Reliance
Jio) which account for over
70% of the internet subscribers in
India.
We recorded SNI-based blocking on both Bharti Airtel and Reliance Jio.
We also discovered that Reliance Jio blocks TLS traffic not just based
on the SNI value, but also on the web server involved with the TLS
handshake. Moreover, we noticed that ACT Fibernet’s DNS resolver directs
users towards servers owned by ACT Fibernet itself. Such servers caused
the TLS handshake to fail, but the root cause of censorship was the DNS.
We also document that one of the endpoints we tested,
collegehumor.com:443
, does not allow establishing TCP connection from
several vantage points and control measurements. Yet, in Reliance Jio,
we see cases where the connections to such endpoints complete
successfully and a timeout occurs during the TLS handshake. We believe
this is caused by some kind of proxy that terminates the TCP connection
and performs the TLS handshake.
Index
TLS Blocking Measurements
Web Censorship in India
Aladdin: our Experimental Implementation
Description of the Experiments
Results Analysis & Discussion
Conclusion & Future Work
TLS Blocking Measurements
Transport Layer Security
(TLS) is a
cryptographic protocol that provides communication with end-to-end
security with guarantees of confidentiality and authenticity, which is
popularly used for encrypting web traffic as done in
HTTPS. The Server
Name
Indication
(SNI), defined first in RFC
6066, is an extension
to TLS that facilitates multiplexing, i.e. the hosting of multiple HTTPS
websites on the same server. In other words, the SNI gives content
providers the opportunity to host a variety of websites under the same
IP address. For example, the 216.58.209.36
IP (belonging to Google)
allows accessing both www.google.com
and www.youtube.com
using HTTPS,
depending on the SNI being used. When a client wants to establish a
secure connection, it fills in the SNI with the hostname of the website
it wants to connect to.
Unfortunately, the SNI travels on the network in cleartext, even though
there are experimental
efforts to
work around this technical limitation. Thus, network operators can use
deep packet
inspection
to track the websites someone is visiting, and also to filter traffic
based on the SNI. The use of SNI-based blocking filtering in
state-directed web censorship is being increasingly
recorded.
In 2019, the use of SNI filtering was documented in
China
and South
Korea;
OONI reported on the SNI based filtering of Wikipedia in
Venezuela
and
China,
as well as of Facebook live-streaming in
Jordan;
and CIS has earlier
documented
the use of this technique by Reliance Jio, the most popular
ISP
in India.
Therefore, researchers in the internet freedom community have started
proposing and implementing techniques to measure SNI-based blocking. As
part of their research on the blocking practices of Indian ISPs, CIS
proposed
a methodology to detect SNI blocking. Around the same time, researchers
at Jigsaw
proposed
improvements in detecting domain blocking, which also included SNI
blocking measurements. OONI later
implemented
and successfully tested on the
field SNI
blocking measurements based on Jigsaw’s methodology.
At their core, these methodologies detect SNI-based blocking by
connecting to an unrelated host that is not blocked (e.g. example.com
),
and checking whether it is possible to successfully complete a TLS
handshake even if the SNI is filled with a hostname that is potentially
blocked (e.g. pornhub.com
in India). The rationale of this technique is
to measure whether there is a specific filtering rule in the network
blocking of a given SNI.
A recent report published by the Magma
project,
however, shows that there are other ways of blocking TLS that are not
detected by this measurement methodology. In particular, they showed
that TLS connections to
www.womenonweb.org were being
blocked, but the SNI blocking measurement methodology did not detect it.
TLS blocking, in fact, only occurred when the SNI was equal to
www.womenonweb.org
and the IP address was the one of
www.womenonweb.org
.
Thus, OONI wrote an experimental
implementation
based on the new Go
engine that was
performing two experiments. The first experiment connected to
www.example.org
using the www.womenonweb.org
SNI. The second experiment,
instead, connected directly to the IP address used by
www.womenonweb.org
. The
results
confirmed the findings of the Magma project’s blog post, and sparked
additional curiosity on whether using the same methodology in other
contexts (e.g. India) could reveal more forms of blocking. A measurement
campaign run by OONI while we were researching this report
documented cases
of TLS blocking solely based on the endpoint being used for DNS over TLS
connections in Iran. Specifically, OONI found cases where the TLS
handshake with 1.1.1.1:853
was blocked regardless of the SNI.
Web Censorship in India
India has a decentralised model of web
censorship,
where state authorities order Internet Service Providers (ISPs) to block
certain websites for their users. State authorities draw these powers
from Section 69A and Section 79 of the Information Technology (IT) Act.
Since there are no technical specifications given by the government,
each ISP is at the liberty to adopt their own method of blocking
websites. A recent
study
of censorship in the Indian state of Manipur using OONI data concluded,
in this regard, that “website blocking within the country varies
primarily from ISP to ISP, rather than from region to region”.
Furthermore,
regulations
notified under Section 69A require ISPs to maintain confidentiality over
certain website blocking orders.
Recent research at
the Centre for Internet and Society (CIS) revealed how Indian ISPs are
using a variety of techniques, including DNS-based blocking, HTTP host
header inspection, and SNI-based filtering. In the absence of a publicly
available official list of blocked hostnames, CIS India compiled a list
of potentially blocked
websites
from (i) publicly-available government orders, (ii) court orders, and
(iii) user reports from various sources. They devised network tests to
identify the methods that different ISPs are using, and recorded how
India’s most popular ISP, Reliance Jio (which serves 50% of Indian
internet
subscribers),
is using SNI inspection for blocking websites. Out of the 4379 websites
that the authors tested for, they found Jio to be censoring 2951
websites via SNI inspection.
In addition to the
opaqueness
surrounding the lists of websites being blocked, CIS India also found
inconsistencies in the list of websites being blocked by each ISP.
Furthermore, only some of the ISPs explicitly relayed a censorship
notice to its users. Simply put, Indian internet users can have wildly
different experiences of web censorship depending on their ISP.
Aladdin: our Experimental Implementation
Aladdin is a
bash script that uses the new OONI Probe engine written in
Go. Given an input
domain (e.g. blocked.com
), Aladdin performs a series of experiments
loosely inspired by the domain-blocking measurement methodology
proposed
by Jigsaw. OONI wrote this script to collect data that could be useful
in better understanding how to evolve its Web
Connectivity
nettest.
In this section, we only describe the experiments that are relevant to
this report. This text describes the performed experiments at a
functional level; the actual implementation may be different, typically
for efficiency reasons. For further information, we encourage you to
read the script source
code and reach out
with questions and feedback on OONI’s Slack
channels.
Because Aladdin is based on the OONI
Engine, all
experiment results are submitted to the OONI collector and
automatically published as part
of OONI S3 buckets.
sni_check
The first experiment we discuss is called sni_check
. It is similar to
the OONI experiment called
sni_blocking,
except that sni_check does not check whether the helper website being
used (example.org
) is actually reachable. This is not an issue because
we know it was reachable while we were running these manual experiments.
The following diagram shows the interactions that occur when performing
this experiment with blocked.com
as its input.
Figure 1: description of the sni_check experiment.
We use example.org
as a test helper. The first step is to use Google’s
DNS over
HTTPS (DoH)
resolver, to map example.org
to its IP address. Once we know the IP
address, we connect to this address on port 443
and we initiate a TLS
handshake with blocked.com
as the SNI. What happens next determines the
result of the experiment.
If there is blocking, we expect the connection to just be closed
(eof_error
) or interrupted (connection_reset
). A timeout
(generic_timeout_error
) also in general implies that there is
interference. By repeating the experiment, we gain more confidence that
such an error is not just a temporary disruption.
If there is no interference, the handshake completes. Because the web
server for example.org
does not handle blocked.com
, the client code
should emit the ssl_invalid_hostname
error indicating that the server
returned a certificate that is not valid for the requested SNI. In such
a case, we can inspect the returned certificate to have further
confidence that we are indeed speaking with the legitimate server that
handles the example.org
domain.
dns_check
The second experiment is called dns_check
. It is conceptually similar
to the dns_consistency OONI
experiment.
The following diagram illustrates the dns_check
experiment.
Figure 2: description of the dns_check experiment.
We basically resolve the same domain (e.g. blocked.com
) using the system
resolver (i.e. the resolver configured on the system where OONI is
running) as well using a DNS over
HTTPS (DoH)
resolver that we trust. In this set of experiments, we used Google’s
DoH
resolver.
The objective of this experiment is to understand whether we can trust
the answer of the system resolver, by comparing its results to the DoH
resolver ones.
When a DNS resolver claims that a domain name does not exist, the
corresponding error is dns_nxdomain_error
. When a resolver returns
private addresses (e.g. 10.0.0.1
), the corresponding error is
dns_bogon_error
. If there are no errors, we expect this experiment to
return two lists: a list of IP addresses for the domain obtained using
the system resolver and a similar list obtained instead using the
trusted DoH resolver. As we will see in the following sections, we will
then use the IP addresses from both lists to perform further checks.
system_resolver_validation
The third experiment we discuss is called system_resolver_validation
.
This experiment is roughly a subset of the OONI’s Web Connectivity
experiment.
Figure 3: description of the system_resolver_validation experiment.
It is called system_resolver_validation
because we use the IP
addresses collected by the system resolver in the previous step to
access the target website using HTTPS, and verify that the IP address
indeed serves the target website. We connect on port 443
and, if we are
successful, we perform a TLS handshake using the target SNI. If the
handshake succeeds, we assume that the specific IP address we are using
is valid for the domain. This means that either we are speaking with the
legitimate web server or, in a less likely but still quite possible
scenario, with a proxy that is willing to let us through.
We consider the experiment successful if we are able to perform the HTTP
GET request fetching the home page of the domain without any TCP or TLS
errors. Failures during the TLS handshake, or later, are flagged as
likely interference. All the failures described previously may occur
during the handshake. It is worth noting that, in this context,
ssl_invalid_hostname
is an error, because we should be able to
establish a TLS connection with the domain, given that we are attempting
to speak to a web server serving such a domain.
doh_resolver_validation
The fourth experiment we discuss is called doh_resolver_validation
.
This experiment is basically the same as the previous one, except that
here we are using the results returned by the DoH resolver as opposed to
the results returned by the system resolver. This experiment gives us an
opportunity to run our test with a valid IP for the domain, which is
useful for cases wherein the system resolver returns an error, or a list
of IPs not related to the domain. This experiment can therefore help us
measure SNI-based blocking when the test network is also blocking
websites using DNS poisoning or injections.
psiphon_check
The fifth experiment we discuss is called psiphon_check
. This
experiment uses the Psiphon
network to fetch
the input domain over HTTPS. It consists of the following two steps.
The first step establishes an encrypted tunnel to one of the thousands
of geographically distributed proxy servers managed by Psiphon, Inc. The
technology used to establish such a tunnel depends on the censorship
techniques implemented in the country in which the experiment is run.
Psiphon, in fact, is optimised to select the censorship evasion
technique that provides the best performance, choosing among techniques
such as obfuscated
protocols
and domain
fronting.
Once the encrypted tunnel with the remote proxy server is established,
Psiphon exposes it using a SOCKS5
proxy
listening on a local port.
The second step performs an HTTPS measurement of the target domain using
the encrypted tunnel via the SOCKS5 proxy. Psiphon’s implementation of
the SOCKS5 protocol is such that when Aladdin requests Psiphon to
connect to a specific domain name on port 443
, it will also rely on
Psiphon for the domain name resolution. For this reason, we do not need
to worry about DNS tampering in this experiment. In turn, Psiphon will
ask the remote proxy server to establish a TCP connection to the
specified domain and port. If the connection is successful, Aladdin will
then perform the TLS handshake and issue a GET request for the homepage.
Otherwise, the SOCKS5 server returns a byte indicating the error that
occurred. Because the set of error codes specified by SOCKS5 is rather
limited, the same error code may actually map to a variety of error
conditions. In our experience, two most frequent errors we have seen in
this context are 0x01
(“general failure”) and 0x05
(“connection
refused”).
Figure 3: description of the psiphon_check experiment.
In the context of this report, we will use the results of the Psiphon
experiment to attempt to access the same domain from another vantage
point. This will give us further confidence of whether errors in
connecting to a website could be caused by interference by the local ISP
or, instead, by the website not currently being reachable.
To learn more about Psiphon, we encourage you to watch the
presentation on
Psiphon from
the 2020 edition of the Internet Measurement
Village.
Description of the Experiments
We ran the Aladdin script from three different ISPs in India: ACT
Fibernet (AS24309), Bharti
Airtel (AS45609), and
Reliance Jio (AS55836). We
attempted to measure four domains for TLS blocking: facebook.com
and
google.com
(both accessible in India via all ISPs); and collegehumor.com
and pornhub.com
(both usually blocked by Indian ISPs). We ran
experiments on May 11st, 12nd, 14th, and 19th, 2020 using
github.com/bassosimone/aladdin@5471390.
We also ran follow-up experiments on June 22nd and 23rd, 2020.
According to the latest Telecom Regulatory Authority of India’s
report
(Table 1.30), the three ISPs we tested together constitute 74.5% of the
internet subscribers in India. All the measurements were made in
Bengaluru to preclude any potential regional variations. The tests for
Reliance Jio and Bharti Airtel were run via mobile internet connections.
As ACT Fibernet does not provide a retail mobile connection, we used a
fixed internet connection to run tests for their network. As far as this
report is concerned, we assume that ISPs do not alter their behaviour
based on the type of connection (mobile or fixed).
Results Analysis & Discussion
This section describes the results of all the experiments we performed.
We fetched measurements from OONI’s
S3.
For brevity, we are going to include only the results for
collegehumor.com
and pornhub.com
, since facebook.com
and google.com
were
not blocked in any of the experiments that we ran.
sni_check
The following table shows the sni_check experiment results. In this
experiment we performed a TLS handshake with the web server serving
example.org
using the SNI indicated in the table, to detect cases
where the presence of this SNI was sufficient to trigger blocking. The
script to generate the table is published as a GitHub
gist.
ISP | SNI | Failure | Count |
---|
ACT Fibernet | collegehumor.com | ssl_invalid_hostname | 2 |
ACT Fibernet | pornhub.com | ssl_invalid_hostname | 2 |
Bharti Airtel | collegehumor.com | eof_error | 3 |
Bharti Airtel | pornhub.com | eof_error | 2 |
Bharti Airtel | pornhub.com | ssl_invalid_hostname | 1 |
Reliance Jio | collegehumor.com | ssl_invalid_hostname | 1 |
Reliance Jio | pornhub.com | ssl_invalid_hostname | 1 |
Table 1. Results of connecting to example.org’s IP address when
using specific SNIs.
As mentioned above, the eof_error
result indicates that the connection
was closed during the TLS handshake, likely because some middlebox
rejected the provided SNI. The following JSON snippet shows a
measurement
for collegehumor.com from Bharti Airtel:
{
"test_keys": {
"network_events": [
{
"failure": null,
"operation": "connect",
"address": "93.184.216.34:443", // (1)
"t": 0.3199562,
"proto": "tcp"
},
{
"failure": null,
"operation": "tls_handshake_start", // (2)
"t": 0.3199999
},
{
"failure": null,
"operation": "write", // (3)
"num_bytes": 286,
"t": 0.3213545
},
{
"failure": "eof_error", // (4)
"operation": "read",
"t": 0.382416
},
{
"failure": "eof_error",
"operation": "tls_handshake_done",
"t": 0.383602
}
]
},
"resolver_asn": "AS9498",
"probe_cc": "IN",
"probe_network_name": "Bharti Airtel Ltd. AS for GPRS Service",
"input": "tlshandshake://93.184.216.34:443",
"probe_asn": "AS45609",
"annotations": {
"step": "sni_blocking",
"session": "38c221ed-5fc6-4897-984d-b612bc43dd24"
},
"resolver_network_name": "BHARTI Airtel Ltd.",
"measurement_start_time": "2020-05-19 08:05:07"
}
Here we basically (1) connect to example.org
’s IP address, (2) start the
TLS handshake, (3) write the ClientHello, and (4) observe that the
connection is closed.
It is also interesting to note that in one specific case we could
complete the TLS handshake with example.org
’s IP address with the
pornhub.com
SNI. We also observed the same pattern in the follow-up
measurements for collegehumor.com
(see
#1,
#2,
#3,
#4,
#5,
and
#6)
and for pornhub.com
(see
#1,
#2,
#3,
#4,
and
#5)
performed on June 22nd and 23rd, 2020. Typically there is blocking, but
three times we succeed to complete the TLS handshake.
Instead, the ssl_invalid_hostname
result hints that there was no
blocking. To be sure about this, we fetched the returned certificate
from the
measurement,
ensured it was
unique,
and computed its
fingerprint.
In all cases, we received the following X.509
certificate:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
0f:d0:78:dd:48:f1:a2:bd:4d:0f:2b:a9:6b:60:38:fe
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=US, O=DigiCert Inc, CN=DigiCert SHA2 Secure Server CA
Validity
Not Before: Nov 28 00:00:00 2018 GMT
Not After : Dec 2 12:00:00 2020 GMT
Subject: C=US, ST=California, [...] CN=www.example.org
[...]
SHA1 Fingerprint=7B:B6:98:38:69:70:36:3D:29:19:CC:57:72:84:69:84:FF:D4:A8:89
This certificate is indeed the one used by example.org
. We can therefore
conclude that, in all these cases, we were able to speak to the test
helper without interference.
Establishing a baseline
In the following sections, we are going to comment on the experiments
where we attempt to connect to the target websites using HTTPS. Before
diving into that, let us check whether we could access such websites
using Psiphon. This check will give
us an opportunity to establish whether the websites were reachable when
we performed the measurement. In fact, the usage of the Psiphon
circumvention tool allowed us to access the websites we wanted to test
using alternative routes over encrypted and obfuscated tunnels to
Psiphon managed proxies.
The following table shows the results of the psiphon_check experiment,
which was computed using a script published at
GitHub.
ISP | Domain | Bootstrap Time | Failure |
---|
ACT Fibernet | pornhub.com | 6.5 | null |
ACT Fibernet | collegehumor.com | 6.6 | general SOCKS server failure |
ACT Fibernet | pornhub.com | 5.7 | null |
ACT Fibernet | collegehumor.com | 5.3 | SOCKS: connection refused |
Reliance Jio | pornhub.com | 7.2 | null |
Reliance Jio | collegehumor.com | 6.8 | SOCKS: connection refused |
Bharti Airtel | pornhub.com | 6.3 | null |
Bharti Airtel | collegehumor.com | 6.1 | SOCKS: connection refused |
Bharti Airtel | pornhub.com | 6.5 | http: unexpected EOF reading trailer |
Bharti Airtel | collegehumor.com | 6.7 | SOCKS: connection refused |
Bharti Airtel | pornhub.com | 8.1 | null |
Bharti Airtel | collegehumor.com | 6.6 | SOCKS: connection refused |
Table 2. Results of fetching the homepage of specific domains using
Psiphon.
The Bootstrap Time column indicates the number of seconds it took
Psiphon to establish an encrypted tunnel with a proxy server. Because
the bootstrap time is defined (and not null
) for each measurement, it
means
that it was always possible to establish a tunnel. The OONI Engine
implementation, in fact, does not set the bootstrap time unless the
tunnel has been successfully established.
Regarding the Failure column, we notice that pornhub.com
is available
on and off, and sometimes there are HTTP protocol errors. Yet, there are
definitely cases in which this domain is reachable via Psiphon. On the
contrary, collegehumor.com
is consistently not reachable.
We could explain this consistent failure with either the website not
being reachable over HTTPS or with censorship experienced by the Psiphon
proxy we were using.
As a follow-up, we ran subsequent measurements targeting
collegehumor.com
from Bharti
Airtel,
Vodafone
(Italy),
and Google
Cloud
(europe-west4 zone) on June 22nd and 23rd, 2020 using the OONI Probe
Engine, OONI Probe for iOS, and netcat
. We attempted to connect to
collegehumor.com using HTTPS with and without using the Psiphon. When
using Psiphon, we obtained the same
errors
reported in the table above. When connecting directly we noticed a
timeout
when trying to establish a TCP connection on port 443
.
While investigating further, we also found an “ancient”
measurement
using OONI Probe 1.2 that timed out when attempting to connect to
collegehumor.com
on port 443
from the Telx ISP in the United States in
2015. However, we also
found
that records of an X.509 certificate for this website exist in the
Certificate Transparency
Log.
We concluded that collegehumor.com
was quite likely not reachable on
port 443
when we measured it, which is why we will not flag as
censorship any failure in connecting to it on port 443
that we may
encounter in the following sections. At the same time, it is interesting
to note that Bharti Airtel was censoring the collegehumor.com
SNI (as we
have seen in the previous section) even though it was not a properly
working HTTPS website during our measurements.
dns_check
The following table shows the domain names resolved using the system
resolver. For each target domain in the Domain column, we performed a
DNS resolution using the resolver configured in the operating system.
The ISP owning the resolver is indicated in the Resolver column. The
table has been computed using a script published at
GitHub.
We then manually annotated IP addresses with their autonomous
system
number (ASN) and network name.
ISP | Resolver | Domain | Failure | IP | IP Network | Count |
---|
ACT Fibernet | ACT Fibernet | collegehumor.com | null | 202.83.21.15 | ACT Fibernet | 1 |
ACT Fibernet | ACT Fibernet | collegehumor.com | null | 49.205.75.6 | ACT Fibernet | 1 |
ACT Fibernet | ACT Fibernet | pornhub.com | null | 202.83.21.15 | ACT Fibernet | 1 |
ACT Fibernet | ACT Fibernet | pornhub.com | null | 49.205.75.6 | ACT Fibernet | 1 |
Bharti Airtel | Bharti Airtel | collegehumor.com | dns_nxdomain_error | | | 3 |
Bharti Airtel | Bharti Airtel | pornhub.com | dns_nxdomain_error | | | 3 |
Reliance Jio | Reliance Jio | collegehumor.com | null | 52.8.26.172 | AMAZON | 1 |
Reliance Jio | Reliance Jio | collegehumor.com | null | 54.193.47.52 | AMAZON | 1 |
Reliance Jio | Reliance Jio | pornhub.com | null | 66.254.114.41 | SWIFTMILL | 1 |
Table 3. Results of resolving specific domains using the system’s
default resolver.
Let us compare the above table with a similar table, where we show the
results obtained using Google public DNS over
HTTPS (DoH)
resolver and a similar analysis
script.
(No errors are included into the table because no DoH query ever
failed.)
Table 4. Results of resolving specific domains using Google’s DoH
resolver.
By comparing the two tables, we conclude that ACT Fibernet and Bharti
Airtel resolvers are lying to us. Reliance Jio’s resolver instead
returns answers that are consistent with Google’s DoH resolver, which we
assume to not be lying.
We say that ACT Fibernet’s resolver is lying because it claims that the
two domains we are testing are hosted by ACT Fibernet itself and it also
claims that they share the same IP addresses. We say that Bharti
Airtel’s resolver is lying because it claims that the two tested domains
do not exist (dns_nxdomain_error
), but in fact they do. This is
consistent with the findings in CIS’ recent
study, which found
ACT Fibernet and Bharti Airtel to be tampering with DNS responses in
this precise way.
While Bharti Airtel’s resolver’s answer prevents us from accessing these
websites, we cannot exclude that ACT Fibernet’s answer is just directing
us to some cache. To investigate this hypothesis, we will need to
attempt to use such IPs and see what happens.
system_resolver_validation
The following table shows the results of the
system_resolver_validation experiment
. This means that we used the IP
addresses previously resolved using the system resolver to connect to
the specific websites to which they belong, according to such a
resolver. The script used to generate the table is published on
GitHub.
ISP | SNI | Failure | Count |
---|
ACT Fibernet | collegehumor.com | eof_error | 2 |
ACT Fibernet | pornhub.com | eof_error | 2 |
Reliance Jio | collegehumor.com | generic_timeout_error | 1 |
Reliance Jio | pornhub.com | eof_error | 1 |
Table 5. Results of using the IP address for a domain returned by
the system’s default resolver to fetch the homepage of such a domain
using the HTTPS protocol.
Of course, the table does not include Bharti Airtel entries, because
Bharti Airtel’s resolver told us that the domains we were looking for do
not exist.
The following snippet shows the recorded
measurement
for pornhub.com from ACT Fibernet:
{
"test_keys": {
"dns_cache": [
"www.pornhub.com 49.205.75.6" // (1)
],
"network_events": [
{
"failure": null,
"operation": "connect",
"address": "49.205.75.6:443", // (2)
"t": 0.002680552,
"proto": "tcp"
},
{
"failure": null,
"operation": "tls_handshake_start", // (3)
"t": 0.00271161
},
{
"failure": null,
"operation": "write", // (4)
"num_bytes": 285,
"t": 0.003049326
},
{
"failure": "eof_error", // (5)
"operation": "read",
"t": 0.005205896
}
]
},
"input": "https://www.pornhub.com/",
"probe_asn": "AS24309",
"annotations": {
"step": "system_resolver_validation",
"session": "c44b5e74-ecb4-4310-b5d8-f9d6b27a1599"
}
}
We see that (1) we use the DNS cache to force the previously discovered
IP, (2) we connect successfully to such IP on port 443, (3) we start the
TLS handshake and then (4) write the ClientHello message, and (5) after
that the connection is closed. Because we know that the IP address we
are using is suspicious, given that it does not belong to the correct
ASN for pornhub.com
, we assume that this is either blocking or a
misconfigured cache.
The same could of course be said for collegehumor.com
failures
on ACT Fibernet. Because the IP address is suspicious, it may either be
blocking or a misconfigured cache.
The following snippet shows a collegehumor.com
measurement
inside Reliance Jio:
{
"test_keys": {
"dns_cache": [
"collegehumor.com 52.8.26.172 54.193.47.52" // (1)
],
"network_events": [
{
"failure": null,
"operation": "connect",
"address": "52.8.26.172:443", // (2)
"t": 0.0405582,
"proto": "tcp"
},
{
"failure": null,
"operation": "tls_handshake_start", // (3)
"t": 0.0406031
},
{
"failure": null,
"operation": "write", // (4)
"num_bytes": 286,
"t": 0.0418597
},
{
"failure": "generic_timeout_error", // (5)
"operation": "tls_handshake_done",
"t": 10.0432773
}
]
},
"probe_cc": "IN",
"probe_network_name": "Reliance Jio Infocomm Limited",
"test_runtime": 10.0446836,
"input": "https://collegehumor.com/",
"probe_asn": "AS55836",
"annotations": {
"step": "system_resolver_validation",
"session": "40504ce1-94e0-4d6d-a4e5-6ede4272e385"
}
}
The sequence of events we see here is basically the same as above. The
two main differences are that (1) the IP addresses we are using seem
legitimate and (5) the TLS handshake fails with a timeout rather than
with the connection being closed.
What is particularly interesting here, though, is that we should not
have been able to connect to collegehumor.com:443/tcp
, according to the
results of the psiphon_check
experiment. This fact seems to indicate
the presence of a some kind of proxy that terminates our TCP/IP
connection and then forwards TLS bytes to the remote server. Because we
know that the remote server is misconfigured on port 443
, it is
reasonable to assume that the timeout error we see is caused by such a
misconfiguration rather than by censorship.
We consider the successful connection to collegehumor.com:443
an anomaly
because the psiphon_check
experiment run around a minute afterwards
failed to connect. Also, the same measurement was later performed using
the result from doh_resolver_check
which yielded the same result.
Subsequent measurements run between May 23rd and June 22nd 2020
confirmed this anomaly pattern (see
#1,
#2,
#3),
but we also saw cases where the measurement failed with a TCP connect
timeout (see
#1,
#2,
#3),
consistently with what we observed from other ISPs. We will further
investigate this behavior as part of our future work.
The failure we see for pornhub.com
, instead, is more suspicious. The
following is a relevant snippet of the JSON
measurement
archived at OONI’s S3:
{
"test_keys": {
"dns_cache": [
"pornhub.com 66.254.114.41"
],
"network_events": [
{
"failure": null,
"operation": "connect",
"address": "66.254.114.41:443",
"t": 0.0369444,
"proto": "tcp"
},
{
"failure": null,
"operation": "tls_handshake_start",
"t": 0.0369909
},
{
"failure": null,
"operation": "write",
"num_bytes": 281,
"t": 0.0380669
},
{
"failure": "eof_error",
"operation": "read",
"t": 0.2627324
},
{
"failure": "eof_error",
"operation": "tls_handshake_done",
"t": 0.2640703
}
]
},
"probe_cc": "IN",
"probe_network_name": "Reliance Jio Infocomm Limited",
"input": "https://pornhub.com/",
"probe_asn": "AS55836",
"annotations": {
"step": "system_resolver_validation",
"session": "285e6728-8c92-4034-8c0d-5c62034a71bd"
},
}
We see the same events as before, except that the TLS handshake fails
with eof_error
. This failure is similar to the one Magma
noticed
in Spain and OONI
noticed in Iran.
Also in this case, blocking only happens when the IP address is
consistent with the SNI. We have in fact seen previously that the
pornhub.com
SNI could be successfully used with example.org
’s IP. (We
also saw this blocking pattern for all subsequent measurements run
between June 22nd and June 23rd, 2020: see
#1,
#2,
#3,
#4,
#5,
and
#6).
Regarding ACT Fibernet and Bharti Airtel, we need to check what happens
when using the IP addresses returned by the DoH resolver before drawing
any conclusion concerning TLS blocking. We will do that in the following
subsection.
doh_resolver_validation
The following table shows the results of the doh_resolver_validation
experiment. This means we used the IP addresses returned by Google’s DoH
resolver to connect to the websites to which they belong to, according
to this resolver. We assume that Google’s DoH resolver is not returning
false answers. This experiment therefore gives us another chance to
verify whether there are additional forms of blocking beyond the system
resolver returning errors or wrong entries. The script used to generate
the table is available on
GitHub.
ISP | SNI | Failure | Count |
---|
ACT Fibernet | collegehumor.com | generic_timeout_error | 2 |
ACT Fibernet | pornhub.com | null | 2 |
Bharti Airtel | collegehumor.com | generic_timeout_error | 3 |
Bharti Airtel | pornhub.com | eof_error | 3 |
Reliance Jio | collegehumor.com | generic_timeout_error | 1 |
Reliance Jio | pornhub.com | eof_error | 1 |
Table 6. Results of using the IP address for a domain returned by
Google’s DoH resolver to fetch the homepage of such a domain using the
HTTPS protocol.
In ACT Fibernet collegehumor.com
is not accessible and pornhub.com
is
reachable. The failure for collegehumor.com
matches the following
pattern (the full
measurement
can of course be accessed on OONI
Explorer):
{
"test_keys": {
"dns_cache": [
"collegehumor.com 52.8.26.172 54.193.47.52" // (1)
],
"network_events": [
{
"failure": "generic_timeout_error",
"operation": "connect",
"address": "52.8.26.172:443", // (2)
"t": 30.001980387,
"proto": "tcp"
},
{
"failure": "generic_timeout_error",
"operation": "connect",
"address": "54.193.47.52:443", // (3)
"t": 60.002742623,
"proto": "tcp"
}
]
},
"probe_asn": "AS24309",
"annotations": {
"step": "doh_resolver_validation",
"session": "1ada324e-736e-45ab-8127-d1066b23c5f5"
}
}
Here we see that (1) we are using the DNS cache to force the correct IPs
and we timeout when we attempt to connect to both IP addresses for the
domain (2, 3). This is consistent with our previous observation that
collegehumor.com:443
is misconfigured and attempting to reach it fails
with a connection timeout when using
Psiphon, as well as when connecting
from vantage points in which it should be censored (e.g. Vodafone Italy
and Google Cloud).
In Bharti Airtel collegehumor.com:443
is also failing with a connect
timeout as expected, as shown by the following
measurement:
{
"test_keys": {
"dns_cache": [
"collegehumor.com 52.8.26.172 54.193.47.52"
],
"network_events": [
{
"failure": "generic_timeout_error",
"operation": "connect",
"address": "52.8.26.172:443",
"t": 30.0091558,
"proto": "tcp"
},
{
"failure": "generic_timeout_error",
"operation": "connect",
"address": "54.193.47.52:443",
"t": 60.014621,
"proto": "tcp"
}
]
},
"probe_asn": "AS45609",
"annotations": {
"step": "doh_resolver_validation",
"session": "38c221ed-5fc6-4897-984d-b612bc43dd24"
}
}
What we see here is again consistent with the results of the
psiphon_check
experiment. The collegehumor.com:443/tcp
endpoint, in
fact, is not working correctly and consistently fails with a timeout
when attempting to connect to it from several vantage points.
Pornhub.com
is blocked by Bharti Airtel during the TLS handshake. This
is not surprising, since we have seen above that Bharti Airtel blocks
any handshake towards any host as long as the SNI contains pornhub.com
.
The pattern that we see is roughly the same as what we previously saw
when discussing sni_check
measurements for Bharti Airtel. Yet, in one
specific instance, we were able to perform a handshake for pornhub.com
,
only to be redirected to www.pornhub.com
, for which the TLS handshake
failed with eof_error
. The following is the relevant snippet of the
measurement:
{
"test_keys": {
"dns_cache": [
"pornhub.com 66.254.114.41"
],
"requests": [{
"failure": "eof_error",
"request": {
"headers_list": [[
"Referer",
"https://pornhub.com/"
], [
"Host",
"www.pornhub.org"
]
],
"url": "https://www.pornhub.org/",
"method": "GET"
}
}, {
"failure": null,
"request": {
"headers_list": [[
"Host",
"pornhub.com"
]
],
"url": "https://pornhub.com/",
"method": "GET"
},
"response": {
"headers_list": [[
"Location",
"https://www.pornhub.org/"
]],
"code": 302
}
}],
"tls_handshakes": [
{
"tls_version": "TLSv1.2",
"no_tls_verify": false,
"server_name": "pornhub.com",
"peer_certificates": [
{
"data": "...",
"format": "base64"
},
{
"data": "...",
"format": "base64"
}
],
"cipher_suite": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"failure": null, // (1)
"negotiated_protocol": "",
"t": 0.4989724
},
{
"tls_version": "",
"no_tls_verify": false,
"server_name": "www.pornhub.org",
"peer_certificates": null,
"cipher_suite": "",
"failure": "eof_error", // (2)
"negotiated_protocol": "",
"t": 0.956141
}
]
}
"probe_asn": "AS45609",
"annotations": {
"step": "doh_resolver_validation",
"session": "f0cfc2f4-ea96-4e99-bf35-8a283db9d3e9"
}
}
The tls_handshakes
field of the above measurement shows, in particular,
that the eof_error
occurs indeed when attempting to perform a TLS
handshake for www.pornhub.com
. In fact, we see that the first handshake
is successful (1) and the second fails (2).
Regarding Reliance Jio, we see exactly the same results of the
system_resolver_validation
experiment, since Reliance Jio’s resolver
returned IP addresses consistent with Google’s DoH resolver’s answer.
This result, therefore, confirms previous findings.
Conclusion & Future Work
We investigated
TLS
blocking in India. The research question was to understand whether there
were cases of TLS blocking caused not only by the value of the Server
Name
Indication
(SNI) field in the ClientHello
TLS message, but also by the destination IP address. That is, cases in
which the SNI blocking methodology we previously developed was not
sufficient.
We measured four domains (facebook.com
, google.com
, collegehumor.com
,
and pornhub.com
) in three popular Indian ISPs: ACT
Fibernet (fixed line), Bharti
Airtel, and Reliance
Jio (mobile). For each domain,
we performed a series of experiments to answer the research question
using a bash
script driving
the new OONI measurement engine in
Go.
Neither facebook.com
nor google.com
were blocked in any of the measured
ISPs.
The following table recaps our findings regarding the blocking of
collegehumor.com
and pornhub.com
on the ACT Fibernet, Bharti Airtel, and
Reliance Jio networks. We call “SNI blocking” the case where we observed
that a specific SNI was blocked when connecting to a control server (the
one managing example.org). “SNI+IP blocking” instead covers the cases
where we observed TLS blocking only when connecting to the correct IP
for the domain.
ISP | Domain | SNI blocking | DNS lying | SNI+IP blocking |
---|
ACT Fibernet | collegehumor.com | | ✔️ | |
ACT Fibernet | pornhub.com | | ✔️ | |
Bharti Airtel | collegehumor.com | ✔️ | ✔️ | |
Bharti Airtel | pornhub.com | ✔️ | ✔️ | |
Reliance Jio | collegehumor.com | | | ❓ |
Reliance Jio | pornhub.com | | | ✔️ |
Table 7. Summary of the measured blocking techniques for each
measured ISP.
ACT Fibernet does not implement TLS blocking. ACT Fibernet’s resolver is
configured to lie to users and redirect them to specific servers that
cause TLS handshakes to fail. While we see TLS failures, the root cause
of censorship is ACT Fibernet’s resolver answer. We observed this
behaviour for both collegehumor.com
and pornhub.com
.
Bharti Airtel seems to be blocking collegehumor.com
and pornhub.com
by
inspecting the SNI field. Bharti Airtel’s DNS resolver also claims that
collegehumor.com
and pornhub.com
are not existing domain names. While we
could not connect to collegehumor.com:443/tcp
, we are confident that
this is not censorship, but the endpoint not behaving correctly. We
have, in fact, observed deterministic TCP connection timeout failures
for this domain across the measurement period, as well as during
subsequent follow-up measurements run from Vodafone Italy, Google Cloud,
and using the Psiphon censorship-evasion network.
Reliance Jio does not block pornhub.com
solely based on the SNIs. We can
successfully use this SNIs when connecting to example.org
. Yet, when
connecting to the legitimate web server for the domain, TLS handshakes
are actually blocked. This kind of blocking where both the SNI value and
the destination IP address value matter is similar to the one that
Magma
and
OONI
previously observed in Spain for www.womenonweb.org
.
Surprisingly, we can sometimes connect to the collegehumor.com:443
endpoint from Reliance Jio, even though all other measurements, as well
as control measurements, suggest that the endpoint is misbehaving: any
connection attempt to it fails to establish a TCP connection and times
out. We conclude that Reliance Jio implements some sort of proxying that
terminates the TCP connection and then performs the TLS handshake and
forwards the bytes back to the client. Therefore, we cannot conclude
whether the timeout we see during the TLS handshake to
collegehumor.com:443
is censorship, or just the proxy failing at
establishing the connection, as it ought to be, and reporting an error
back to us.
In the future, we would like to better characterise TLS blocking in
India. Previous measurements by CIS
India,
for example, indicate that Reliance Jio was blocking specific SNIs
regardless of the destination address being used. For destination
addresses where they did not notice SNI censorship, they hypothesise
that a middlebox was not present on that particular network path. The
experiments included in this report, yet, open up the additional
possibility that Reliance Jio’s blocking is intentionally depending also
on the destination IP address. It would be interesting to try and
understand whether it is possible to perform TLS handshakes for other
domains with one of the blocked websites. For example, we could check
what happens when connecting to the IP address of collegehumor.com
with
the SNI being, e.g., example.org
. It would also be interesting to
determine whether the blocking depends on the TLS version and other TLS
options.