使用正则表达式python重新排列文件的各行

Question

so I am creating a script that will go through a file with a certain format and rearrange it to the same format as another file. 因此，我正在创建一个脚本，该脚本将通过具有特定格式的文件，并将其重新排列为与另一个文件相同的格式。 Here is a sample of the unformatted file 这是未格式化文件的示例

, 0x40a846, mov [ecx+2bh],al, 88 41 2B, , , , \par
, 0x40a849, jmp $+001775cbh (0x581e14), E9 C6 75 17 00, , , , \par
, 0x40a84e, int3, CC, , , , \par
, 0x40a84f, int3, CC, , , , \par
, 0x40a850, push esi, 56, , , , \par
, 0x40a851, mov esi,ecx, 8B F1, , , , \par

the end goal is to have each line of the file looking like this 最终目标是使文件的每一行看起来像这样

0x40a846, 0x 88 41 2B ,"mov [ecx+2bh],al",,,

My main issue is some lines of the file only have one section of source code while others have 2, making it difficult for me to make a regular expression that will grab both of them without grabbing the code bytes on accident. 我的主要问题是文件中的某些行仅包含源代码的一部分，而其他行仅包含源代码的一部分，这使我很难创建一个正则表达式来捕获这两行而不会偶然捕获代码字节。 I wanted to use capture groups to rearrange the information on each line. 我想使用捕获组来重新排列每一行的信息。 Below is my script as of now: 下面是我到目前为止的脚本：

import csv
import string
import re, sys
file_to_change = 'testingthecodexlconverter.csv'
    # = raw_input("Please specify what codexl file you would like to convert: ")
file1 = open(file_to_change, 'r+')

with file1  as f:
    for line in f:
        line = line[2:-12]
        line = line.rstrip('\n') + ',,'
       # mo = re.search(r'(.*?),.*?.*?,.*?(.*?),.*?.*?,.*?(.*?),.*?.*?,.*?(.*?)', line)
       #mo = re.search(r'(.*?),.*?(.*?,.*?.*?,).*?.*?,.*?(.*?),.*?.*?,.*?(.*?)', line)
        mo = re.search(r'(.*?),.*?(.*?.*?,\S*?,).*?.*?.*?,.*?(.*?),', line)  
        if mo:
            print(mo.group(2))

Can anyone lend me a hand? 有人可以帮我吗？

Answer 1

You can tokenize your lines as suggested by others by splitting at the commas and then just add them back when you print 您可以按照逗号分隔的方式标记行，然后在打印时将其重新添加

file_to_change = 'testingthecodexlconverter.csv'

file1 = open(file_to_change, 'r+')

with file1  as f:
    for line in f:
        line = line[2:-12]

        tokens = line.split(',')

        # if column index 3 is empty then print without formatting for
        # unnecessary space.
        if not tokens[3]:
            print(tokens[0] + ", " + tokens[2].strip(" ") + ", " + tokens[1] + ",,,")
        else:
            print(tokens[0] + "," + tokens[3] +  ", " + tokens[2].strip(" ") + ", " + tokens[1] + ",,,")

this will print in the format: 这将以以下格式打印：

0x40a846, 88 41 2B, al,  mov [ecx+2bh],,,
0x40a849, E9 C6 75 17 00,  jmp $+001775cbh (0x581e14),,,
0x40a84e, CC,  int3,,,
0x40a84f, CC,  int3,,,
0x40a850, 56,  push esi,,,
0x40a851, 8B F1, ecx,  mov esi,,,

Answer 2

I'd use pandas and just rearrange the columns according to your need as it seems they are in a reasonable csv format. 我会使用pandas并根据您的需要重新排列列，因为它们似乎采用了合理的csv格式。 This method also allows you to visualise how you manipulate the data in your csv whilst you edit it: 此方法还使您可以可视化在编辑时如何在csv中处理数据：

import pandas as pd
df = pd.read_csv('inputCSV.csv', header=None).fillna('')
df = df.astype(str)
out = df[[4,1,2]].to_csv(index=False, header=False, coding='utf-8', lineterminator='\r\n', mode='wb')

Your problem is a littler unclear in what data format you are exacting in each individual column. 您在每个单独的列中使用哪种数据格式的问题还不清楚。

I believe you might have missing comas in your input csv file. 我相信您在输入的csv文件中可能缺少逗号。 My suggestion is to do a search for these missing commas and add them to have a properly formatted input file. 我的建议是搜索这些缺失的逗号，并将其添加为具有正确格式的输入文件。

The fastest way of course is by just splitting the string as mentioned above using .split() but it seems you are not sure what you are doing hence my suggestion of pandas for parsing. 最快的方法当然是通过使用.split()如上所述分割字符串，但是您似乎不确定自己在做什么，因此我建议对pandas进行解析。

Answer 3

You can use the csv module, which you have already included, but aren't currently using. 您可以使用已包含但尚未使用的csv模块。

import csv 

file_path = 'test.csv' 

with open(file_path) as csvfile: 
    reader = csv.reader(csvfile) 
    writer = csv.writer(open('tempfile.csv', 'w'), delimiter=',') 
    for row in reader: 
        new_row = [e.strip() for e in row if len(e.strip()) > 0] 
        # The new row should have the first element, then the last,
        # followed by everything else that wasn't empty.
        new_row = [new_row[0], new_row[-1]] + new_row[1:-1] 
        writer.writerow(new_row)

The new csv file looks like this: 新的csv文件如下所示：

0x40a846,88 41 2B,mov [ecx+2bh],al 
0x40a849,E9 C6 75 17 00,jmp $+001775cbh (0x581e14) 
0x40a84e,CC,int3
0x40a84f,CC,int3
0x40a850,56,push esi
0x40a851,8B F1,mov esi,ecx

使用正则表达式python重新排列文件的各行

问题描述

3 个解决方案

解决方案1
1 2015-06-09 23:01:43

解决方案2
0 2015-06-09 22:56:03

解决方案3
0 2015-06-09 23:13:18

使用正则表达式python重新排列文件的各行

问题描述

3 个解决方案

解决方案1 1 2015-06-09 23:01:43

解决方案2 0 2015-06-09 22:56:03

解决方案3 0 2015-06-09 23:13:18

解决方案1
1 2015-06-09 23:01:43

解决方案2
0 2015-06-09 22:56:03

解决方案3
0 2015-06-09 23:13:18