简体   繁体   English

使用python2脚本写入.csv文件时,多个文件中的数据附加了多余的字符(。)。

[英]Extra characters ('.') appended to data from multiple files while writing to .csv file with python2 script

I am trying to make a relatively simple script in python2 which crawls through multiple .out files in a directory and extracts some data. 我正在尝试在python2中创建一个相对简单的脚本,该脚本会通过目录中的多个.out文件进行爬网并提取一些数据。 The data is then written to a .csv file along with an identifier. 然后将数据与标识符一起写入.csv文件。

My issue is that a seemingly random '.' 我的问题是看似随机的'。 or '..' is appended to the end of the data string. 或“ ..”附加到数据字符串的末尾。

Here is my code (I know this is horrible to look at, sorry in advance) : 这是我的代码(我知道看这很可怕,对不起):

import os
import string
import time
import sys
import csv

input = raw_input
location = input('Set directory path: ')
os.makedirs(location+'/outputs/')
print "Created output directory."
print "Waiting for archiving to finish..."
forCall = "cd "+location+" && mv *.out outputs/"
os.system(forCall)
time.sleep(1)
print "Archived output files."

newLocation = location+"/outputs/"


def checker(filein, bondlength):
    o = open("results.csv", "a")
    with open(filein) as curFile:
        for line in curFile:
            if "SCF Done:" in line:
                var = line
                var = filter(lambda x: x.isdigit() or x == '-' or x == '.', var)
                var = var[1:-2] # slices the first '-' and two trailing '.'

                bondlength = ''.join(bondlength.split())
                bondlength = bondlength[:-4] # slices .out from 'bondlength.out'
                o.write(var+';'+bondlength+'\n')
    o.close()

for filename in os.listdir(newLocation):
    fileLocation = newLocation+filename
    checker(fileLocation, filename)

datacsv = csv.reader(open('results.csv'), delimiter=";")
sortedData = sorted(datacsv, key=lambda row: row[1], reverse=False)

with open('sortedData.csv', 'wb') as csv_file:
    wr = csv.writer(csv_file, delimiter=";")
    wr.writerows(sortedData)

The line in the .out file that I'm interested in looks like this: 我感兴趣的.out文件中的行如下所示:

SCF Done: E(RB+HF-LYP) = -107.450926197 AU after 5 cycles 完成SCF:5个周期后E(RB + HF-LYP)= -107.450926197 AU

Now I need to get the value of E(whatever computational method was used) for each .out file and append it to a .csv file with 2 columns: one for the energy and one for the bond length (multiplied by 10^3, but that doesn't really matter now), which is the name of the .out file (ex. 1036.out). 现在,我需要为每个.out文件获取E(使用任何计算方法)的值,并将其附加到具有两列的.csv文件中:一列用于能量,一列用于键长(乘以10 ^ 3,但这现在并不重要),即.out文件的名称(例如1036.out)。

Any help is greatly appreciated. 任何帮助是极大的赞赏。

The problem is in your approach to extracting the data - filtering out characters that are not digits, dashes or dots from your example line would result in --107.450926197..5 - the first dash comes from the HF-LYP part, the trailing 5 comes from the 5 cycles and the two dots preceding it come from AU . 问题在于您提取数据的方法-从示例行中过滤出不是数字,破折号或点的字符将导致--107.450926197..5第一个破折号来自HF-LYP部分, --107.450926197..5 5来自5 cycles ,前两个点来自AU When you slice out the first and last characters of the substring you'll therefore get -107.450926197.. . 当您将子字符串的第一个和最后一个字符切出时,将得到-107.450926197..

What I'd suggest instead is to find the number by finding the = in your string and then chopping out everything until the next whitespace, something like: 相反,我建议是通过在字符串中找到=来找到数字,然后将所有内容都切掉直到下一个空格,例如:

var = "SCF Done: E(RB+HF-LYP) = -107.450926197 A.U. after 5 cycles"
var = var[var.find("=") + 1:].strip()  # clean out everything before the equal sign
var = var[:var.find(" ") + 1].strip()  # clean out everything after the first whitespace
# -107.450926197

Or slightly more unsafe by splitting on = and then on space: 或者通过在=然后在空格上分割来更不安全:

var = "SCF Done: E(RB+HF-LYP) = -107.450926197 A.U. after 5 cycles"
var = var.split("=", 1)[1].split(None, 1)[0]
# -107.450926197

Or to do it with a simple regex: 或使用简单的正则表达式执行此操作:

import re

find_numbers = re.compile(r"-?[0-9]\d*(\.\d+)?")  # find any number

var = "SCF Done: E(RB+HF-LYP) = -107.450926197 A.U. after 5 cycles"
var = find_numbers.search(var).group()
# -107.450926197

You should also consider loading your current results first, then writing to the same list as you iterate over your *.out files, sort that list and overwrite the results.csv . 您还应该考虑先加载当前结果,然后在遍历*.out文件时写入相同的列表,对列表进行排序并覆盖results.csv

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python脚本读取三个csv文件并写入一个csv文件 - python Script to read three csv files and writing in one csv file python/pyspark - 从 csv 读取特殊字符并将其写回文件 - python/pyspark - Reading special characters from csv and writing it back to the file 将特殊字符写入CSV文件时出现问题 - Issues while writing special characters to csv file Python在从CSV文件读取时添加额外的文本和大括号 - Python adding extra text and braces while reading from CSV file 使用Python将多个文件写入一个文件,同时从用户那里获取输入以选择要扫描的文件 - Writing multiple files into one file using Python, while taking input from the user to choose the files to scan 提取根据多个文本文件中的数据计算得出的输出,并将其写入CSV文件的不同列 - Taking output calculated from data in multiple text files and writing them into different columns of a CSV file 使用CSV模块附加多个文件,同时删除附加的标题 - Using CSV module to append multiple files while removing appended headers 从 csv 文件中读取文件,其中 python 显示第一个列表并且未按正确顺序显示附加文件 - Reading files from a csv file with python showing the first list and not displaying appended files in the correct order 从python / scrapy(python framework)在单个csv文件中将数据写入多张工作表 - writing data into multiple sheets in a single csv file from python/scrapy(python framework) 优化从Python列表中写入多个CSV文件 - Optimize writing multiple CSV files from lists in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM