Let’s say we have a packet capture file (.pcap) and we want to get as much information out of it as possible. One option could be wireshark and its command line version tshark. Using the latter we will be able to manipulate and format the output using tools like sed, grep, awk…
Extracting host names with tshark
Since we are dealing with mostly http traffic we may be interested in the sites that have been visited. To obtain this information we can use the http.host field and then a bit of sorting and this will show us the top 10 sites.
tshark -T fields -e http.host -r tor.pcap > dns.txt cat dns.txt | sort | uniq -c | sort -nr | head
User agents
tshark -R 'http contains "User-Agent:"' -T fields -e http.user_agent -r tor2b.pcap | sort | uniq -c | sort -nr | less
The option -R allows us to define display filters, in the same way we would in wireshark. You can find a list of useful display filters here.
Email address
Another interesting bit of data are email addresses, which we can extract by using a regexp on the raw data.
tshark -r tor.pcap -R "data-text-lines" -T fields -e text > alldata.txt grep -Eio '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' alldata.txt | sort | uniq
Requested urls
We can also get a list of all the requested URLs (via the GET method):
tshark -r http-traffic.pcap -T fields -e http.host -e http.request.uri -Y 'http.request.method == "GET"' | sort | uniq | less
Don’t forget to take a look at the official documentation.