简体   繁体   中英

How to convert a tab delimited .txt file to an xml or csv using Python

I have an issue where in our application while writing a python script we use the file data as string only and output as string only. and then that output string we use as a file eg xml or csv Below is the eg of an tab delimited.txt file format which will be coming from the customer into our application.H is an header which will come only once and D is line level data which can be repeatative. Below is the specification about what these fields are. How can I convert this data to an csv format or an xml format. I saw lot of examples on stackoverflow but all are about open the file and load the file. Which is not working in my case.

H   CustID   CustPO   OrderDate   UserID   EmergencyFlag
D   <null>   PartNumber   Description   Quantity
D   <null>   PartNumber   Description   Quantity
D   <null>   PartNumber   Description   Quantity

Expected Input:

H   1520982 wpg-A10026  2020-11-24  STG N
D       52892   LAMP RED OVAL STOP TAIL TURN    2   
D       AL919338    ECU MODULE 2S/1M M/H PLC SELECT 1

Expected CSV Output:

H,1520982,wpg-A10026,2020-11-24,STG,N
D,,52892,LAMP RED OVAL STOP TAIL TURN,2 
D,,AL919338,ECU MODULE 2S/1M M/H PLC SELECT,1

Expected XML Output:

<?xml version="1.0" encoding="utf-8"?>
<Record>
<Header>
<RecordType>H</RecordType>
<CustID>1520982</CustPO>
<CustPO>wpg-A10026</CustPO>
<OrderDate>2020-11-24</OrderDate>
<UserID>STG</UserID>
<EmergencyFlag>N</EmergencyFlag>
</Header>
<Details> 
<RecordType>D</RecordType>
 2nd field is null
<PartNumber>52892</PartNumber>
<Description>LAMP RED OVAL STOP TAIL TURN</Description>
<Quantity>2</Quantity>
</Details>
<Details> 
<RecordType>D</RecordType>
 2nd field is null
<PartNumber>AL919338</PartNumber>
<Description>ECU MODULE 2S/1M M/H PLC SELECT</Description>
<Quantity>1</Quantity>
</Details>
</Record>

The output I got but somewhat wrong. If you see the description field its one entire field but it was replaced with commas. But not the actual tab spaces. H 1520982 wpg-A10026 2020-11-24 STG ND 52892 LAMP,,RED,OVAL,STOP,TAIL,TURN 2
D AL919338 ECU,MODULE,2S/1M,M/H,PLC,SELECT 1.

So What I did i corrected your code as content[i] = filter(bool, content[i].split("\t")) so it can actually give the comma where there was a tab delimit. For now it seems bit working but my second field under Details D was null so i wanted it as two commas,, but that did not appear any hint for that? Thank you very much for your time. appreciated

The current output I got is this below With two issues still remaining after "D" there is a null value need to check if that is actually required to appear in the output. 2nd issue the Description field already had one comma, So need to check with customer if they can stop sending comma or other characters in description field. Or for better accuracy i need to convert this to an xml. But Still Thank you very much this is good to start with H,1520982,wpg-A10026,2020-11-24,STG,ND,52892,LAMP, RED OVAL STOP TAIL TURN,2, D,AL919338,ECU MODULE 2S/1M M/H PLC SELECT,1,

Normally,the csv module should be able to it. If not (you do not have consistent amount of spaces separating the values, you can manually split the lines:

content = "INPUTGOESHERE".split("\n")

for i in range(len(content)):
    content[i] = filter(bool, content[i].split(" ")) # split the lines at spaces and filter out empty strings

outstr = ""

for line in content:
    line = ",".join(line) # convert values list to a comma separated string for each line
    outstr += line + "\n"

print(outstr)

See the edit of this answer for how to convert CSV to XML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM