简体   繁体   English

在python中使用reg ex组合行

[英]Combining lines using reg ex in python

If wanted to combine six lines (each containing 3 elements) so that the final outcome is a single line with three elements so that the first is the addition of all the first elements, the second is the addition of all the second elements and the third is the concatenation of all the third elements.如果要组合六行(每行包含 3 个元素),最终结果是一行包含三个元素,因此第一行是所有第一个元素的相加,第二行是所有第二个元素的相加,第三行是所有第三个元素的串联。

For example,例如,

We have,我们有,

12.34  -79   x
-3.5    23      y
32.2E2   2   z
4.23e-10   +45  x
62E+2    -4     y
0.0    0        z

and we need我们需要

9428.84 -13 xyzxyz

Here is my current code:这是我当前的代码:

f = open('data.txt', 'r')
""" opens the file """
import re
""" Imports the regular expressions module"""
# lines = f.readlines ()
lines = list(f)
""" Reads all the lines of the file """


p = re.compile(r'\s*^([-]?([1-9]\d|\d)[E|e]?[+\d]?(.)(\d+(E|e)[-]?\d+|\d+))\s*([-,+]?([1-9]\d+|\d))\s*([x|y|z])$')

for x in lines:
       m = p.match(x)
       if m:
           print (x)

You can do this by zipping the contents of the file so that all number of the first column are on first list, all number of the second column on second list and finally all characters on the third list.您可以通过zipping文件的内容来做到这一点,这样第一列的所有数字都在第一个列表中,第二列的所有数字在第二个列表中,最后所有字符在第三个列表中。 Then all you do is simply sum the first two lists and join the third list that contains the characters:然后你所做的只是简单地将前两个列表sumjoin包含字符的第三个列表:

sum1 = 0
sum2 = 0
finalStr = ""

with open("data.txt", "r") as infile:

   lines = list(zip(*[line.split() for line in list(infile)]))

   sum1 = sum(map(float,lines[0]))
   sum2 = sum(map(float,lines[1]))
   finalStr = "".join(lines[2])

   # Some formatting for float numbers
   print("{:.2f}".format(sum1), end=" ") 
   print("{:.0f}".format(sum2), end=" ")

   print(finalStr)

Output:输出:

9428.84 -13 xyzxyz

There is no need for a regex in your case.在您的情况下不需要正则表达式。 Regular expressions are used to deconstruct strings, not to combine them.正则表达式用于解构字符串,而不是组合它们。 If you do not mind using pandas, the solution takes two lines:如果您不介意使用 Pandas,解决方案需要两行:

import pandas as pd
data = pd.read_table("data.txt", sep='\s+', header=None)

df.sum().values.tolist()
#[9428.840000000422, -13, 'xyzxyz']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM