简体   繁体   English

将CSV中的值作为字符串解析为文本文件

[英]Parsing values from CSV as strings into text file

I am attempting to create a txt file that includes XML files in a dir and the text within each XML files when a tag is present. 我正在尝试创建一个txt文件,其中包含有dir中的XML文件以及存在标签时每个XML文件中的文本。

I am having trouble reading a csv row in as a variable using the command below. 我无法使用以下命令将csv行作为变量读取。 I have attempted to pull the required values multiple ways but continue to run into a brick wall. 我试图以多种方式提取所需的值,但仍然遇到障碍。

Here is the code: 这是代码:

container = raw_input("Choose a filename for your container:")
epub = zipfile.ZipFile( container + ".zip", 'w')
xmlinput = glob.glob('./*.xml')
def xmldrop(dir):
  for r,d,f in os.walk(dir):
     for files in f:
        if files.endswith(".xml"):
            dom=parse(os.path.join(r, files))
            name = dom.getElementsByTagName('title')
            with open('catalog.csv', 'a') as f:
                f.write(files +  "," + name[0].firstChild.nodeValue  + "\n")
xmldrop("./")

line_number = 0
with open('catalog.csv', 'rb') as f:
    mycsv = csv.reader(f)
    mycsv = list(mycsv)
    text = mycsv[line_number+1][1]

list_tpl = '''
<Container>
<FileName>
%(FileName)s
</FileName>
</Container>'''
FileName = ""

for i, xml in enumerate(xmlinput):
    basename = os.path.basename(xml)
    FileName += ('<Fileid="%i" filename="%s"> <title>%s</title> </Fileid>' %
                 (i+1, basename, text))

epub.writestr('list.txt', list_tpl % {
  'FileName': FileName
})

I am able to successfully pull the information into a csv file as seen with this output: 如下面的输出所示,我能够成功地将信息提取到一个csv文件中:

file_1.xml,Intro file_1.xml,简介

file_2.xml,Assessment file_2.xml,评估

file_3.xml,Review file_3.xml,查看

file_4.xml,Catalog file_4.xml,目录

but the list.txt file that gets generated looks like: 但是生成的list.txt文件如下所示:

<Container>
<FileName>
<Fileid="1" filename="file_1.xml"> <title>Assessment</title></p> </Fileid>
<Fileid="2" filename="file_2.xml"> <title>Assessment</title></p> </Fileid>
<Fileid="3" filename="file_3.xml"> <title>Assessment</title></p> </Fileid>
<Fileid="4" filename="file_4.xml"> <title>Assessment</title></p> </Fileid>
</FileName>
</Container>

Desired output would be: 所需的输出为:

<Container>
<FileName>
<Fileid="1" filename="file_1.xml"> <title>Intro</title> </Fileid>
<Fileid="2" filename="file_2.xml"> <title>Assessment</title> </Fileid>
<Fileid="3" filename="file_3.xml"> <title>Review</title> </Fileid>
<Fileid="4" filename="file_4.xml"> <title>Catalog</title> </Fileid>
</FileName>
</Container>

Any assistance is greatly appreciated. 非常感谢您的协助。 I have been trying to pair the two up for over a week now with no success. 我一直试图将两者配对超过一个星期,但没有成功。

You aren't updating the text variable when you are printing out your xml. 在打印出xml时,您不会更新text变量。 You set it once text = mycsv[line_number+1][1] but you never update it again, so it keeps outputting Assesment 您只需将其设置为text = mycsv [line_number + 1] [1],但再也不会更新它,因此它会继续输出评估

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM