使用python2脚本写入.csv文件时，多个文件中的数据附加了多余的字符（。）。

Question

I am trying to make a relatively simple script in python2 which crawls through multiple .out files in a directory and extracts some data. 我正在尝试在python2中创建一个相对简单的脚本，该脚本会通过目录中的多个.out文件进行爬网并提取一些数据。 The data is then written to a .csv file along with an identifier. 然后将数据与标识符一起写入.csv文件。

My issue is that a seemingly random '.' 我的问题是看似随机的'。 or '..' is appended to the end of the data string. 或“ ..”附加到数据字符串的末尾。

Here is my code (I know this is horrible to look at, sorry in advance) : 这是我的代码（我知道看这很可怕，对不起）：

import os
import string
import time
import sys
import csv

input = raw_input
location = input('Set directory path: ')
os.makedirs(location+'/outputs/')
print "Created output directory."
print "Waiting for archiving to finish..."
forCall = "cd "+location+" && mv *.out outputs/"
os.system(forCall)
time.sleep(1)
print "Archived output files."

newLocation = location+"/outputs/"


def checker(filein, bondlength):
    o = open("results.csv", "a")
    with open(filein) as curFile:
        for line in curFile:
            if "SCF Done:" in line:
                var = line
                var = filter(lambda x: x.isdigit() or x == '-' or x == '.', var)
                var = var[1:-2] # slices the first '-' and two trailing '.'

                bondlength = ''.join(bondlength.split())
                bondlength = bondlength[:-4] # slices .out from 'bondlength.out'
                o.write(var+';'+bondlength+'\n')
    o.close()

for filename in os.listdir(newLocation):
    fileLocation = newLocation+filename
    checker(fileLocation, filename)

datacsv = csv.reader(open('results.csv'), delimiter=";")
sortedData = sorted(datacsv, key=lambda row: row[1], reverse=False)

with open('sortedData.csv', 'wb') as csv_file:
    wr = csv.writer(csv_file, delimiter=";")
    wr.writerows(sortedData)

The line in the .out file that I'm interested in looks like this: 我感兴趣的.out文件中的行如下所示：

SCF Done: E(RB+HF-LYP) = -107.450926197 AU after 5 cycles 完成SCF：5个周期后E（RB + HF-LYP）= -107.450926197 AU

Now I need to get the value of E(whatever computational method was used) for each .out file and append it to a .csv file with 2 columns: one for the energy and one for the bond length (multiplied by 10^3, but that doesn't really matter now), which is the name of the .out file (ex. 1036.out). 现在，我需要为每个.out文件获取E（使用任何计算方法）的值，并将其附加到具有两列的.csv文件中：一列用于能量，一列用于键长（乘以10 ^ 3，但这现在并不重要），即.out文件的名称（例如1036.out）。

Any help is greatly appreciated. 任何帮助是极大的赞赏。

Answer 1

The problem is in your approach to extracting the data - filtering out characters that are not digits, dashes or dots from your example line would result in --107.450926197..5 - the first dash comes from the HF-LYP part, the trailing 5 comes from the 5 cycles and the two dots preceding it come from AU . 问题在于您提取数据的方法-从示例行中过滤出不是数字，破折号或点的字符将导致--107.450926197..5第一个破折号来自HF-LYP部分， --107.450926197..5 5来自5 cycles ，前两个点来自AU 。 When you slice out the first and last characters of the substring you'll therefore get -107.450926197.. . 当您将子字符串的第一个和最后一个字符切出时，将得到-107.450926197.. 。

What I'd suggest instead is to find the number by finding the = in your string and then chopping out everything until the next whitespace, something like: 相反，我建议是通过在字符串中找到=来找到数字，然后将所有内容都切掉直到下一个空格，例如：

var = "SCF Done: E(RB+HF-LYP) = -107.450926197 A.U. after 5 cycles"
var = var[var.find("=") + 1:].strip()  # clean out everything before the equal sign
var = var[:var.find(" ") + 1].strip()  # clean out everything after the first whitespace
# -107.450926197

Or slightly more unsafe by splitting on = and then on space: 或者通过在=然后在空格上分割来更不安全：

var = "SCF Done: E(RB+HF-LYP) = -107.450926197 A.U. after 5 cycles"
var = var.split("=", 1)[1].split(None, 1)[0]
# -107.450926197

Or to do it with a simple regex: 或使用简单的正则表达式执行此操作：

import re

find_numbers = re.compile(r"-?[0-9]\d*(\.\d+)?")  # find any number

var = "SCF Done: E(RB+HF-LYP) = -107.450926197 A.U. after 5 cycles"
var = find_numbers.search(var).group()
# -107.450926197

You should also consider loading your current results first, then writing to the same list as you iterate over your *.out files, sort that list and overwrite the results.csv . 您还应该考虑先加载当前结果，然后在遍历*.out文件时写入相同的列表，对列表进行排序并覆盖results.csv 。

使用python2脚本写入.csv文件时，多个文件中的数据附加了多余的字符（。）。

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-12-20 10:51:39

使用python2脚本写入.csv文件时，多个文件中的数据附加了多余的字符（。）。

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-12-20 10:51:39

解决方案1
0 已采纳 2017-12-20 10:51:39