- A Beginner’s Guide to Converting PCAP to CSV File Using Python
- Step 1: Install Required Libraries
- Step 2: Load PCAP File
- Step 3: Convert PCAP to Pandas DataFrame
- Step 4: Save DataFrame to CSV
- Conclusion
- How to Convert PCAP files to CSV for Network Traffic Analysis
- Collecting PCAP files
- breaking down PCAP files
- Install tshark
- Convert PCAP to CSV
- References
- Converting PCAPs to Other Formats
- PCAPs to LAS
- PCAPs to PCD
- PCAPs to PLY
A Beginner’s Guide to Converting PCAP to CSV File Using Python
If you are working with network traffic data, then you may have come across PCAP files. A PCAP file is a file format used to capture network traffic. However, sometimes it may be necessary to convert a PCAP file to a more readable format, such as CSV. Fortunately, Python offers powerful libraries that make it easy to convert PCAP files to CSV files.
Step 1: Install Required Libraries
Before we can begin the conversion process, we need to install two Python libraries:
- scapy : This library is used for parsing and analyzing network packets.
- pandas : This library is used for handling data in a tabular format.
To install these libraries, open your terminal/command prompt and enter the following commands:
pip install scapy pip install pandas
Step 2: Load PCAP File
Once we have installed the required libraries, we can load the PCAP file using scapy . Here is an example code for loading a PCAP file:
from scapy.all import * packets = rdpcap('example.pcap')
In the above code, we import rdpcap from scapy.all and use it to load our PCAP file named example.pcap .
Step 3: Convert PCAP to Pandas DataFrame
Next, we need to convert our packets object to a Pandas DataFrame . Here is an example code for doing this:
import pandas as pd df = pd.DataFrame(columns=['Source', 'Destination', 'Protocol', 'Length']) for packet in packets: try: src = packet['IP'].src dst = packet['IP'].dst protocol = packet['IP'].proto length = len(packet) df = df.append(, ignore_index=True) except: pass
In the above code, we create an empty DataFrame with four columns: Source , Destination , Protocol , and Length . We then loop through each packet in our packets object and extract the required information such as source and destination IPs, protocol type, and packet length. We then append this information to our DataFrame.
Step 4: Save DataFrame to CSV
Finally, we need to save our DataFrame to a CSV file. Here is an example code for doing this:
df.to_csv('example.csv', index=False)
In the above code, we use the to_csv method of our DataFrame to save it to a CSV file named example.csv .
Conclusion
In conclusion, converting PCAP files to CSV files using Python is relatively simple with the help of libraries like scapy and pandas . The resulting CSV file can be easily analyzed using various data analysis tools or used for further processing.
How to Convert PCAP files to CSV for Network Traffic Analysis
Researchers working on Computer Network or Cyber Security often need to analyze network traffics. In that case, they use Wireshark Packet Analyzer or any other similar traffic analysis tools to capture and analyze packets.
However, if you want to perform data analysis, cleaning, modeling, or feature analysis and classification for the network traffic, you might want to convert the PCAP files into a CSV file.
Here, in this post we will see, how to convert PCAP files into CSV files easily.
Collecting PCAP files
First, we need a PCAP file. A PCAP file usually includes a lot of network traffics. You can obtain a lot of PCAP files online. For example,
or you can capture network traffics yourself using Wireshark Packet Analyzer tool.
breaking down PCAP files
If the PCAP file is very large, you can divide that into smaller parts using the following command
$ tcpdump -r old_file -w new_files -C 10
python packet2df.py -file thursday-100M.pcap -o thursday-100M.csv
Install tshark
You need to install the command-line version of wireshark named as tshark * Ubuntu: sudo apt install -y tshark * macOS: brew install tshark you can find other options from their official page.
In macOS, if you have already installed wireshark, you might need to do the following if you don’t find the command (check using $ which tshark ) command.
$ ln -s /Applications/Wireshark.app/Contents/MacOS/tshark /usr/local/bin/tshark
Convert PCAP to CSV
Because, we are not capturing live traffic, we need to use -r to read a PCAP file.
Other options we need to use is -T fields to get the data in field format, -E header=y to print the header, -E separator=, to seperate the values using , , -E quote=d for using double quotation, -E occurrence=f to keep the first occurrence in case there is many (you can use a instead of f to keep all), and most importantly the required fields using -e .
In one of my work I used the following fields to create a CSV from a PCAP file.
$ tshark -r input.pcap -T fields -E header=y -E separator=, -E quote=d -E occurrence=f \ -e ip.version -e ip.hdr_len -e ip.tos -e ip.id -e ip.flags -e ip.flags.rb -e ip.flags.df \ -e ip.flags.mf -e ip.frag_offset -e ip.ttl -e ip.proto -e ip.checksum -e ip.src -e ip.dst \ -e ip.len -e ip.dsfield -e tcp.srcport -e tcp.dstport -e tcp.seq -e tcp.ack -e tcp.len \ -e tcp.hdr_len -e tcp.flags -e tcp.flags.fin -e tcp.flags.syn -e tcp.flags.reset \ -e tcp.flags.push -e tcp.flags.ack -e tcp.flags.urg -e tcp.flags.cwr -e tcp.window_size \ -e tcp.checksum -e tcp.urgent_pointer -e tcp.options.mss_val > output.csv
There are actually a lot of other fields, and which you are going to choose depends on your need.
For example, this article did the following for a different purpose.
tshark -r thursday-100M.pcap -T fields -E header=y -E separator=, -E quote=d -E occurrence=f -e ip.src -e ip.dst -e ip.len -e ip.flags.df -e ip.flags.mf \ -e ip.fragment -e ip.fragment.count -e ip.fragments -e ip.ttl -e ip.proto -e tcp.window_size -e tcp.ack -e tcp.seq -e tcp.len -e tcp.stream -e tcp.urgent_pointer \ -e tcp.flags -e tcp.analysis.ack_rtt -e tcp.segments -e tcp.reassembled.length -e ssl.handshake -e ssl.record -e ssl.record.content_type -e ssl.handshake.cert_url.url_len \ -e ssl.handshake.certificate_length -e ssl.handshake.cert_type -e ssl.handshake.cert_type.type -e ssl.handshake.cert_type.types -e ssl.handshake.cert_type.types_len \ -e ssl.handshake.cert_types -e ssl.handshake.cert_types_count -e dtls.handshake.extension.len -e dtls.handshake.extension.type -e dtls.handshake.session_id \ -e dtls.handshake.session_id_length -e dtls.handshake.session_ticket_length -e dtls.handshake.sig_hash_alg_len -e dtls.handshake.sig_len -e dtls.handshake.version \ -e dtls.heartbeat_message.padding -e dtls.heartbeat_message.payload_length -e dtls.heartbeat_message.payload_length.invalid -e dtls.record.content_type -e dtls.record.content_type \ -e dtls.record.length -e dtls.record.sequence_number -e dtls.record.version -e dtls.change_cipher_spec -e dtls.fragment.count -e dtls.handshake.cert_type.types_len \ -e dtls.handshake.certificate_length -e dtls.handshake.certificates_length -e dtls.handshake.cipher_suites_length -e dtls.handshake.comp_methods_length -e dtls.handshake.exponent_len \ -e dtls.handshake.extension.len -e dtls.handshake.extensions_alpn_str -e dtls.handshake.extensions_alpn_str_len -e dtls.handshake.extensions_key_share_client_length \ -e http.request -e udp.port -e frame.time_relative -e frame.time_delta -e tcp.time_relative -e tcp.time_delta > thursday-100M.csv
If you want to filter data, for example, you only need the first packet of each network conversation, you can use a filter using the -Y option.
$ tshark -r input.pcap -Y "tcp.flags.syn eq 1" -T fields -E header=y -E separator=, -E quote=d -E occurrence=f \ -e ip.version -e ip.hdr_len -e ip.tos -e ip.id -e ip.flags -e ip.flags.rb -e ip.flags.df \ -e ip.flags.mf -e ip.frag_offset -e ip.ttl -e ip.proto -e ip.checksum -e ip.src -e ip.dst \ -e ip.len -e ip.dsfield -e tcp.srcport -e tcp.dstport -e tcp.seq -e tcp.ack -e tcp.len \ -e tcp.hdr_len -e tcp.flags -e tcp.flags.fin -e tcp.flags.syn -e tcp.flags.reset \ -e tcp.flags.push -e tcp.flags.ack -e tcp.flags.urg -e tcp.flags.cwr -e tcp.window_size \ -e tcp.checksum -e tcp.urgent_pointer -e tcp.options.mss_val > output.csv
Last Trick: If you want to split a large CSV file to smaller files using the command line, just do the following:
and it will divide the large file into smaller ones containing 5000 records in each.
If you are interested to learn about OS fingerprinting from PCAP files, you can read this post: Passive Operating System Fingerprinting by Analyzing PCAP files
References
Updated: February 5, 2021
Converting PCAPs to Other Formats
Sometimes we want to get a point cloud ( XYZ + other fields) as a CSV file for further analysis with other tools.
To convert the first 5 scans of our sample data from a pcap file, you can try:
$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-csv --scan-num 5
PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-csv --scan-num 5
The source code of an example below:
1from itertools import islice 2# precompute xyzlut to save computation in a loop 3xyzlut = client.XYZLut(metadata) 4 5# create an iterator of LidarScans from pcap and bound it if num is specified 6scans = iter(client.Scans(source)) 7# if num_scans is None 8scans = islice(scans, start_index, start_index + num_scans) 9 10row_layer = np.fromfunction(lambda i, j: i, 11 (metadata.format.pixels_per_column, 12 metadata.format.columns_per_frame), dtype=int) 13column_layer = np.fromfunction(lambda i, j: j, 14 (metadata.format.pixels_per_column, 15 metadata.format.columns_per_frame), dtype=int) 16column_layer_staggered = client.destagger(metadata, column_layer, 17 inverse=True) 18 19idx = None 20for idx, scan in enumerate(scans): 21 22 # initialize the field names for csv header 23 if not field_names or not field_fmts: 24 field_names, field_fmts = get_fields_info(scan) 25 26 # copy per-column timestamps and measurement_ids for each beam 27 timestamps = np.tile(scan.timestamp, (scan.h, 1)) 28 measurement_ids = np.tile(scan.measurement_id, (scan.h, 1)) 29 30 # grab channel data 31 fields_values = [scan.field(ch) for ch in scan.fields] 32 33 frame = np.dstack((timestamps, row_layer, column_layer_staggered, 34 measurement_ids, *fields_values)) 35 36 # output points in "image" vs. staggered order 37 frame = client.destagger(metadata, frame) 38 39 # destagger XYZ separately since it has a different type 40 xyz = xyzlut(scan.field(client.ChanField.RANGE)) 41 xyz_destaggered = client.destagger(metadata, xyz) 42 43 if dual: 44 xyz2 = xyzlut(scan.field(client.ChanField.RANGE2)) 45 xyz2_destaggered = client.destagger(metadata, xyz2) 46 47 # get all data as one H x W x num fields int64 array for savetxt() 48 frame = np.dstack(tuple(map(lambda x: x.astype(object), 49 (frame, xyz_destaggered, xyz2_destaggered)))) 50 51 else: 52 # get all data as one H x W x num fields int64 array for savetxt() 53 frame = np.dstack(tuple(map(lambda x: x.astype(object), 54 (frame, xyz_destaggered)))) 55 56 # write csv out to file 57 csv_path = output_names[idx] 58 print(f'write frame index #idx + start_index>, to file: csv_path>') 59 60 header = '\n'.join([f'frame num: idx>', field_names]) 61 62 np.savetxt(csv_path, 63 frame.reshape(-1, frame.shape[2]), 64 fmt=field_fmts, 65 delimiter=',', 66 header=header)
Because we stored the scan as structured 2D images, we can easily recover it by loading it back into a numpy.ndarray and continuing to use it as a 2D image.
import numpy as np # read array from CSV frame = np.loadtxt('my_frame_00000.csv', delimiter=',') # convert back to "fat" 2D image [H x W x num_fields] shape frame = frame.reshape((128, -1, frame.shape[1]))
We used 128 while restoring 2D image from a CSV file because it’s the number of channels of our OS-1-128.pcap sample data recording.
PCAPs to LAS
To convert to the first 5 scans of our sample data from a pcap file to LAS , you can try:
$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-las --scan-num 5
PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-las --scan-num 5
Checkout the examples.pcap.pcap_to_las() documentation for the example source code.
PCAPs to PCD
To convert to the first 5 scans of our sample data from a pcap file to PCD , you can try:
$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-pcd --scan-num 5
PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-pcd --scan-num 5
Checkout the examples.pcap.pcap_to_pcd() documentation for the example source code.
PCAPs to PLY
Here we will reuse the PCAP to PCD function that uses Open3d and will exploit the extensive Open3d File IO that gives us an easy way to save the loaded point cloud to PLY . Alternative ways are available via plyfile library.
To convert to the first 5 scans of our sample data from a pcap file to PLY , you can try:
$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-ply --scan-num 5
PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-ply --scan-num 5
Checkout the examples.pcap.pcap_to_ply() documentation for the example source code.
© Copyright 2022, Ouster, Inc..