How to convert CSV to XML using Python
Here I am going to create an example on how to convert CSV to XML. CSV means Comma Separated Value. So the sources of these values may be different. These values may be put into a file or in a string.
XML is an Extensible Markup Language that defines a set of rules for encoding document in a format that is both human and machine readable.
Here I am going to convert either CSV string or file to XML. The output can be written either in XML string or file.
Comma Separated Values (CSV) and the Extensible Markup Language (XML) are the most widely used formats for data, and conversion between these two formats needs often to be accomplished.
Especially to XML, because this format is very well supported by modern applications, and is very well suited for further data manipulation and customization.
Prerequisites
Convert CSV to XML
Now I will write the code in Python programming language to convert csv data to xml data.
Before I convert into XML data I need to read the CSV data. There are various ways to read CSV data either from file or csv string.
Here I am going to use simple way to read the csv data and produce the xml output. Here I am reading from csv file, you can also read from csv string. You can check link various ways to read CSV data either from file or csv string.
The output in displayed in console.
The sample csv file can be downloaded later from the source code.
import csv f = open('sample.csv') csv_f = csv.reader(f) data = [] for row in csv_f: data.append(row) f.close() #print (data[1:]) def convert_row(row): return """ %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s """ % (row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8], row[9], row[10], row[11], row[12], row[13], row[14], row[15], row[16]) print ('\n'.join([convert_row(row) for row in data[1:]]))
The above code will give you the following output:
119736 FL CLAY COUNTY 498960 498960 498960 498960 498960 792148.9 0 9979.2 0 0 30.102261 -81.711777 Residential Masonry 448094 FL CLAY COUNTY 1322376.3 1322376.3 1322376.3 1322376.3 1322376.3 1438163.57 0 0 0 0 30.063936 -81.707664 Residential Masonry 206893 FL CLAY COUNTY 190724.4 190724.4 190724.4 190724.4 190724.4 192476.78 0 0 0 0 30.089579 -81.700455 Residential Wood 333743 FL CLAY COUNTY 0 79520.76 0 0 79520.76 86854.48 0 0 0 0 30.063236 -81.707703 Residential Wood 172534 FL CLAY COUNTY 0 254281.5 0 254281.5 254281.5 246144.49 0 0 0 0 30.060614 -81.702675 Residential Wood 785275 FL CLAY COUNTY 0 515035.62 0 0 515035.62 884419.17 0 0 0 0 30.063236 -81.707703 Residential Masonry 995932 FL CLAY COUNTY 0 19260000 0 0 19260000 20610000 0 0 0 0 30.102226 -81.713882 Commercial Reinforced Concrete 223488 FL CLAY COUNTY 328500 328500 328500 328500 328500 348374.25 0 16425 0 0 30.102217 -81.707146 Residential Wood 433512 FL CLAY COUNTY 315000 315000 315000 315000 315000 265821.57 0 15750 0 0 30.118774 -81.704613 Residential Wood
To write into XML file instead of printing into console, you can use the following line of code. This will write into output.xml file with the same output as above.
with open('output.xml', 'w') as f: f.write('\n'.join([convert_row(row) for row in data[1:]]))
That’s all about how to convert CSV data to XML data using Python program.
Converting Data in CSV to XML in Python
Comma Separated Values (CSV) is a widely used format to store data. It successfully stores data in the form of rows and columns in an easily parseable format. As the name suggests, the rows in a CSV file are separated by a delimiter, usually the newline (‘\n’) character. Each value in a row is separated from the other using a comma (,). Opening a CSV file using Microsoft Excel portrays the data in a tabular format, but the CSV format can be understood by opening the CSV file using a text editor as well. There are libraries available in Python that can be used to parse CSV files, but the most common library is the Pandas library.
If you are unfamiliar with the pandas library, check out this tutorial on pandas.
In the above image, one can see that the first row is nothing but the headers or the titles of the columns. These titles are used in Python as a subscript to Pandas dataframe to call an entire column.
Extensible Markup Language or XML is a Markup Language used to represent data in a structured format. It uses tags in ‘<>’ characters, with opening and closing tags representing data. The tag consists of the key/type of the data included in the tag content. The content of the tag includes the values of the key/type in the tag. It would be more clarified as the article progresses. XML also provides styling options, to beautifully display the data. An XML file can be opened with supported browsers and can be edited with text editors or dedicated XML editors.
The main motto of this article is to write a program that reads data from the CSV file and converts it into structured XML format. The tags of the XML will be named after the column names in the CSV file. The tag content will consist of the row values in the CSV. The above CSV will be used to be converted to XML format. The given input (in tabular form) and the expected output (in XML form) are shown below:
Hyundai Aura 2019 GJ23BB8384 White 28000 Maruti Eeco 2001 GJ15AM7634 White 102013 Hyundai Alcazar 2021 GJ01KJ9845 Red 3401 Honda City 2007 GJ06RE4198 Black 45000 Mahindra Bolero 2006 GJ17OY8714 Black 129090
Expected XML Output after the Program has run
A Python function will be created which takes the input of the name and path of the CSV file and the name and path of the output XML file. The rest of the work will be done by the code. The driver program will call the function, and then the XML file will be opened in Browser to check the output. It should be noted that no style characteristics are being added to the XML structure here, and simply an XML Document Tree is being created.
Creating the Function
The function takes two inputs, the input file path and the output file path. But there needs to be an Error checker to make sure the inputs given to the function are accurate and not wrong. Therefore, the first thing the function does is to check whether the given input file name is a CSV file and the given output file name is an XML file. The following code demonstrates the same:
import pandas as pd def CSVtoXML(inputfile,outputfile): if not inputfile.lower().endswith('.csv'): print('Expected A CSV File') return 0 if not outputfile.lower().endswith('.xml'): print('Expected a XML file') return 0
The first thing to be done in the above code is to import the Pandas module. It would be highly necessary to read the CSV file as a dataframe. The check on the input file as CSV and the output file as XML is done using Python String’s endswith() function. The file always ends with its extension, in the case of CSV is ‘.csv’, and in the case of XML is ‘.xml’. The endswith() function checks the last characters of the string with the given input. If both of them match and are equal, it returns True or returns False.
Now, a try-except block needs to be established to ensure that the given CSV file as an input exists. If the File is not found, the program shall return a FileNotFoundError. On intercepting this error, the program shall return an Error message giving the input as an existing file. The following code does the same:
try: df=pd.read_csv(inputfile) except FileNotFoundError: print('CSV file not found') return 0
The above code snippet tries to read the given CSV file as a Pandas Dataframe object. If the file doesn’t exist, pandas will produce a FileNotFoundError.
Creating XML from Dataframe
Now that the CSV file has been successfully read as a Dataframe, the code manipulates the dataframe into a string format, which matches the actual XML required format. The following code demonstrates the same:
entireop='\n' att=df.columns rowop='' for j in range(len(df)): for i in range(len(att)): if i==0: rowop=rowop+f' title="">\n' elif i==len(att)-1: rowop=rowop+f'> >\n>\n' else: rowop=rowop+f'> >\n' entireop=entireop+rowop+' ' with open(outputfile,'w') as f: f.write(entireop) CSVtoXML('Car Details.csv','example.xml')
In the above code snippet, entireop variable contains the string of the entire XML output, which is required. The XML must always be declared with a tag describing the data portrayed by the XML Document Tree. Here, the tag is named . Also, a description of the data is given as oldcars, implying that the car details portrayed by the XML Document are old cars.
The columns feature of Dataframe gives a list containing all the column names of the Dataframe as an output. It is used to store each and every column name, and they can be later used as a subscript to the Dataframe to call an entire column. The att variable stores the column list. The remaining task is to run through all the rows in the Dataframe, get a separate input for each value in the columns of the row, create a tag using the column name, and put the tag content as the value taken from the row.
The for loop does the task described earlier. To accurately locate a particular value in the Dataframe, the exact column number and row number are required. For e.g., the car model ‘Alcazar’ is situated in the third column of the third row. So it is called a Dataframe[2][2]. The first subscript is column number, and the second subscript is row number. It should be noted that the Dataframe follows 0-indexing; that is, the first element in Dataframe is Dataframe[0][0]. An easier way to call the value at Dataframe[2][2] is by subscripting the column with its column name. It is same as Dataframe[‘model’][2]. As the ‘model’ value lies at the third position in the att list, it can be easily referenced by the Dataframe, using the att list.
That is done in the for loop provided above. The XML Document Tree also should be able to independently identify each and every data object uniquely. That is the reason that the first tag of every data object, in this case, carid, is provided with a title attribute containing the data value of carid. The rest of the tags are simply column names enclosed in <> as starting tags and >as ending tags. The tag content is put as the value in the column at the current row number.
After the entire XML tree has been created, it needs to be connected with the tag created earlier. Line 25 does the same as it joins entireop with the produced XML Document Tree and closes the collection tag. After this, the output file is opened in writing mode, and the XML Document Tree created in a String form is written to the file. The following output is produced after the program is run:
Hyundai Aura 2019 GJ23BB8384 White 28000 Maruti Eeco 2001 GJ15AM7634 White 102013 Hyundai Alcazar 2021 GJ01KJ9845 Red 3401 Honda City 2007 GJ06RE4198 Black 45000 Mahindra Bolero 2006 GJ17OY8714 Black 129090
The Final XML Output is Saved in example.xml file
This XML file can also be opened in Browser. It will produce an output like this:
Pandas Dataframe To XML Function()
Pandas also provides a function to the Dataframe which allows the user to directly convert the CSV file to an XML file. It provides with more XML-accurate structure and also the produced output can be easily parsed by the browser. The entire work can be done in two lines of code:
df=pd.read_csv('Car Details.csv') df.to_xml('example.xml')
It produces more or less the same kind of output. The output can be seen in the image below:
Entire Code for Reference
The defined function CSVtoXML() has been presented below for reference. It works on a time complexity of O(c*r) where c is the number of columns in the CSV file and r is the number of rows in the CSV file.
import pandas as pd def CSVtoXML(inputfile,outputfile): if not inputfile.lower().endswith('.csv'): print('Expected A CSV File') return 0 if not outputfile.lower().endswith('.xml'): print('Expected a XML file') return 0 try: df=pd.read_csv(inputfile) except FileNotFoundError: print('CSV file not found') return 0 entireop='\n' att=df.columns rowop='' for j in range(len(df)): for i in range(len(att)): if i==0: rowop=rowop+f' title="">\n' elif i==len(att)-1: rowop=rowop+f'> >\n>\n' else: rowop=rowop+f'> >\n' entireop=entireop+rowop+' ' with open(outputfile,'w') as f: f.write(entireop) CSVtoXML('Car Details.csv','example.xml')
Conclusion
XML is a Markup Language that produces data in a structured format for the user. It can also design and produce beautiful output on the data if style attributes are provided to the XML. CSV is a data storage format that uses delimiters to separate rows and column values from each other. It is comparatively easy to convert a CSV file to an XML file. Usually, the column names are considered tag names, and the values in the row are considered tag content. It uses simple string manipulation techniques to create the XML Document Root Tree and then save it to an XML file. Creating the function using nested for loops might lead to the function taking a lot of time. Using the to_xml() function will do the same work faster and in a more standard way.