Python pcap to csv

Содержание

A Beginner’s Guide to Converting PCAP to CSV File Using Python
Step 1: Install Required Libraries
Step 2: Load PCAP File
Step 3: Convert PCAP to Pandas DataFrame
Step 4: Save DataFrame to CSV
Conclusion
How to Convert PCAP files to CSV for Network Traffic Analysis
Collecting PCAP files
breaking down PCAP files
Install tshark
Convert PCAP to CSV
References
Converting PCAPs to Other Formats
PCAPs to LAS
PCAPs to PCD
PCAPs to PLY

A Beginner’s Guide to Converting PCAP to CSV File Using Python

If you are working with network traffic data, then you may have come across PCAP files. A PCAP file is a file format used to capture network traffic. However, sometimes it may be necessary to convert a PCAP file to a more readable format, such as CSV. Fortunately, Python offers powerful libraries that make it easy to convert PCAP files to CSV files.

Step 1: Install Required Libraries

Before we can begin the conversion process, we need to install two Python libraries:

scapy : This library is used for parsing and analyzing network packets.
pandas : This library is used for handling data in a tabular format.

To install these libraries, open your terminal/command prompt and enter the following commands:

pip install scapy pip install pandas

Step 2: Load PCAP File

Once we have installed the required libraries, we can load the PCAP file using scapy . Here is an example code for loading a PCAP file:

from scapy.all import * packets = rdpcap('example.pcap')

In the above code, we import rdpcap from scapy.all and use it to load our PCAP file named example.pcap .

Step 3: Convert PCAP to Pandas DataFrame

Next, we need to convert our packets object to a Pandas DataFrame . Here is an example code for doing this:

import pandas as pd df = pd.DataFrame(columns=['Source', 'Destination', 'Protocol', 'Length']) for packet in packets: try: src = packet['IP'].src dst = packet['IP'].dst protocol = packet['IP'].proto length = len(packet) df = df.append(, ignore_index=True) except: pass

In the above code, we create an empty DataFrame with four columns: Source , Destination , Protocol , and Length . We then loop through each packet in our packets object and extract the required information such as source and destination IPs, protocol type, and packet length. We then append this information to our DataFrame.

Step 4: Save DataFrame to CSV

Finally, we need to save our DataFrame to a CSV file. Here is an example code for doing this:

df.to_csv('example.csv', index=False)

In the above code, we use the to_csv method of our DataFrame to save it to a CSV file named example.csv .

Conclusion

In conclusion, converting PCAP files to CSV files using Python is relatively simple with the help of libraries like scapy and pandas . The resulting CSV file can be easily analyzed using various data analysis tools or used for further processing.

Источник

How to Convert PCAP files to CSV for Network Traffic Analysis

Researchers working on Computer Network or Cyber Security often need to analyze network traffics. In that case, they use Wireshark Packet Analyzer or any other similar traffic analysis tools to capture and analyze packets.

However, if you want to perform data analysis, cleaning, modeling, or feature analysis and classification for the network traffic, you might want to convert the PCAP files into a CSV file.

Here, in this post we will see, how to convert PCAP files into CSV files easily.

Collecting PCAP files

First, we need a PCAP file. A PCAP file usually includes a lot of network traffics. You can obtain a lot of PCAP files online. For example,

or you can capture network traffics yourself using Wireshark Packet Analyzer tool.

breaking down PCAP files

If the PCAP file is very large, you can divide that into smaller parts using the following command

$ tcpdump -r old_file -w new_files -C 10

python packet2df.py -file thursday-100M.pcap -o thursday-100M.csv

Install tshark

You need to install the command-line version of wireshark named as tshark * Ubuntu: sudo apt install -y tshark * macOS: brew install tshark you can find other options from their official page.

In macOS, if you have already installed wireshark, you might need to do the following if you don’t find the command (check using $ which tshark ) command.

$ ln -s /Applications/Wireshark.app/Contents/MacOS/tshark /usr/local/bin/tshark

Convert PCAP to CSV

Because, we are not capturing live traffic, we need to use -r to read a PCAP file.

Other options we need to use is -T fields to get the data in field format, -E header=y to print the header, -E separator=, to seperate the values using , , -E quote=d for using double quotation, -E occurrence=f to keep the first occurrence in case there is many (you can use a instead of f to keep all), and most importantly the required fields using -e .

In one of my work I used the following fields to create a CSV from a PCAP file.

$ tshark -r input.pcap -T fields -E header=y -E separator=, -E quote=d -E occurrence=f \ -e ip.version -e ip.hdr_len -e ip.tos -e ip.id -e ip.flags -e ip.flags.rb -e ip.flags.df \ -e ip.flags.mf -e ip.frag_offset -e ip.ttl -e ip.proto -e ip.checksum -e ip.src -e ip.dst \ -e ip.len -e ip.dsfield -e tcp.srcport -e tcp.dstport -e tcp.seq -e tcp.ack -e tcp.len \ -e tcp.hdr_len -e tcp.flags -e tcp.flags.fin -e tcp.flags.syn -e tcp.flags.reset \ -e tcp.flags.push -e tcp.flags.ack -e tcp.flags.urg -e tcp.flags.cwr -e tcp.window_size \ -e tcp.checksum -e tcp.urgent_pointer -e tcp.options.mss_val > output.csv

There are actually a lot of other fields, and which you are going to choose depends on your need.

For example, this article did the following for a different purpose.

tshark -r thursday-100M.pcap -T fields -E header=y -E separator=, -E quote=d -E occurrence=f -e ip.src -e ip.dst -e ip.len -e ip.flags.df -e ip.flags.mf \ -e ip.fragment -e ip.fragment.count -e ip.fragments -e ip.ttl -e ip.proto -e tcp.window_size -e tcp.ack -e tcp.seq -e tcp.len -e tcp.stream -e tcp.urgent_pointer \ -e tcp.flags -e tcp.analysis.ack_rtt -e tcp.segments -e tcp.reassembled.length -e ssl.handshake -e ssl.record -e ssl.record.content_type -e ssl.handshake.cert_url.url_len \ -e ssl.handshake.certificate_length -e ssl.handshake.cert_type -e ssl.handshake.cert_type.type -e ssl.handshake.cert_type.types -e ssl.handshake.cert_type.types_len \ -e ssl.handshake.cert_types -e ssl.handshake.cert_types_count -e dtls.handshake.extension.len -e dtls.handshake.extension.type -e dtls.handshake.session_id \ -e dtls.handshake.session_id_length -e dtls.handshake.session_ticket_length -e dtls.handshake.sig_hash_alg_len -e dtls.handshake.sig_len -e dtls.handshake.version \ -e dtls.heartbeat_message.padding -e dtls.heartbeat_message.payload_length -e dtls.heartbeat_message.payload_length.invalid -e dtls.record.content_type -e dtls.record.content_type \ -e dtls.record.length -e dtls.record.sequence_number -e dtls.record.version -e dtls.change_cipher_spec -e dtls.fragment.count -e dtls.handshake.cert_type.types_len \ -e dtls.handshake.certificate_length -e dtls.handshake.certificates_length -e dtls.handshake.cipher_suites_length -e dtls.handshake.comp_methods_length -e dtls.handshake.exponent_len \ -e dtls.handshake.extension.len -e dtls.handshake.extensions_alpn_str -e dtls.handshake.extensions_alpn_str_len -e dtls.handshake.extensions_key_share_client_length \ -e http.request -e udp.port -e frame.time_relative -e frame.time_delta -e tcp.time_relative -e tcp.time_delta > thursday-100M.csv

If you want to filter data, for example, you only need the first packet of each network conversation, you can use a filter using the -Y option.

$ tshark -r input.pcap -Y "tcp.flags.syn eq 1" -T fields -E header=y -E separator=, -E quote=d -E occurrence=f \ -e ip.version -e ip.hdr_len -e ip.tos -e ip.id -e ip.flags -e ip.flags.rb -e ip.flags.df \ -e ip.flags.mf -e ip.frag_offset -e ip.ttl -e ip.proto -e ip.checksum -e ip.src -e ip.dst \ -e ip.len -e ip.dsfield -e tcp.srcport -e tcp.dstport -e tcp.seq -e tcp.ack -e tcp.len \ -e tcp.hdr_len -e tcp.flags -e tcp.flags.fin -e tcp.flags.syn -e tcp.flags.reset \ -e tcp.flags.push -e tcp.flags.ack -e tcp.flags.urg -e tcp.flags.cwr -e tcp.window_size \ -e tcp.checksum -e tcp.urgent_pointer -e tcp.options.mss_val > output.csv

Last Trick: If you want to split a large CSV file to smaller files using the command line, just do the following:

and it will divide the large file into smaller ones containing 5000 records in each.

If you are interested to learn about OS fingerprinting from PCAP files, you can read this post: Passive Operating System Fingerprinting by Analyzing PCAP files

References

Updated: February 5, 2021

Источник

Converting PCAPs to Other Formats

Sometimes we want to get a point cloud ( XYZ + other fields) as a CSV file for further analysis with other tools.

To convert the first 5 scans of our sample data from a pcap file, you can try:

$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-csv --scan-num 5

PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-csv --scan-num 5

The source code of an example below:

 1from itertools import islice 2# precompute xyzlut to save computation in a loop 3xyzlut = client.XYZLut(metadata) 4 5# create an iterator of LidarScans from pcap and bound it if num is specified 6scans = iter(client.Scans(source)) 7# if num_scans is None 8scans = islice(scans, start_index, start_index + num_scans) 9 10row_layer = np.fromfunction(lambda i, j: i, 11 (metadata.format.pixels_per_column, 12 metadata.format.columns_per_frame), dtype=int) 13column_layer = np.fromfunction(lambda i, j: j, 14 (metadata.format.pixels_per_column, 15 metadata.format.columns_per_frame), dtype=int) 16column_layer_staggered = client.destagger(metadata, column_layer, 17 inverse=True) 18 19idx = None 20for idx, scan in enumerate(scans): 21 22 # initialize the field names for csv header 23 if not field_names or not field_fmts: 24 field_names, field_fmts = get_fields_info(scan) 25 26 # copy per-column timestamps and measurement_ids for each beam 27 timestamps = np.tile(scan.timestamp, (scan.h, 1)) 28 measurement_ids = np.tile(scan.measurement_id, (scan.h, 1)) 29 30 # grab channel data 31 fields_values = [scan.field(ch) for ch in scan.fields] 32 33 frame = np.dstack((timestamps, row_layer, column_layer_staggered, 34 measurement_ids, *fields_values)) 35 36 # output points in "image" vs. staggered order 37 frame = client.destagger(metadata, frame) 38 39 # destagger XYZ separately since it has a different type 40 xyz = xyzlut(scan.field(client.ChanField.RANGE)) 41 xyz_destaggered = client.destagger(metadata, xyz) 42 43 if dual: 44 xyz2 = xyzlut(scan.field(client.ChanField.RANGE2)) 45 xyz2_destaggered = client.destagger(metadata, xyz2) 46 47 # get all data as one H x W x num fields int64 array for savetxt() 48 frame = np.dstack(tuple(map(lambda x: x.astype(object), 49 (frame, xyz_destaggered, xyz2_destaggered)))) 50 51 else: 52 # get all data as one H x W x num fields int64 array for savetxt() 53 frame = np.dstack(tuple(map(lambda x: x.astype(object), 54 (frame, xyz_destaggered)))) 55 56 # write csv out to file 57 csv_path = output_names[idx] 58 print(f'write frame index #idx + start_index>, to file: csv_path>') 59 60 header = '\n'.join([f'frame num: idx>', field_names]) 61 62 np.savetxt(csv_path, 63 frame.reshape(-1, frame.shape[2]), 64 fmt=field_fmts, 65 delimiter=',', 66 header=header)

Because we stored the scan as structured 2D images, we can easily recover it by loading it back into a numpy.ndarray and continuing to use it as a 2D image.

import numpy as np # read array from CSV frame = np.loadtxt('my_frame_00000.csv', delimiter=',') # convert back to "fat" 2D image [H x W x num_fields] shape frame = frame.reshape((128, -1, frame.shape[1]))

We used 128 while restoring 2D image from a CSV file because it’s the number of channels of our OS-1-128.pcap sample data recording.

PCAPs to LAS

To convert to the first 5 scans of our sample data from a pcap file to LAS , you can try:

$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-las --scan-num 5

PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-las --scan-num 5

Checkout the examples.pcap.pcap_to_las() documentation for the example source code.

PCAPs to PCD

To convert to the first 5 scans of our sample data from a pcap file to PCD , you can try:

$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-pcd --scan-num 5

PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-pcd --scan-num 5

Checkout the examples.pcap.pcap_to_pcd() documentation for the example source code.

PCAPs to PLY

Here we will reuse the PCAP to PCD function that uses Open3d and will exploit the extensive Open3d File IO that gives us an easy way to save the loaded point cloud to PLY . Alternative ways are available via plyfile library.

To convert to the first 5 scans of our sample data from a pcap file to PLY , you can try:

$ python3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-ply --scan-num 5

PS > py -3 -m ouster.sdk.examples.pcap $SAMPLE_DATA_PCAP_PATH $SAMPLE_DATA_JSON_PATH pcap-to-ply --scan-num 5

Checkout the examples.pcap.pcap_to_ply() documentation for the example source code.

Источник