简体   繁体   中英

How to split a string at line breaks in python?

I want to copy some tabular data from Excel into a python array. That is, user willselect a range in an Excel table, press "Copy" (CTRL+C) so that the range will be copied to clipboard. Then I will get this clipboard data into a python array (list). I use win32clipboard from pywin32 to get clipboard data into an array:

import win32clipboard

def getClip():
    win32clipboard.OpenClipboard()
    data = win32clipboard.GetClipboardData()
    win32clipboard.CloseClipboard()
    return data

I copy the following range A1:B5 from Excel:

在此输入图像描述

When I use the function above, I get a string like:

'365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'

How to split this string into a list, so that the list will look like:

[(365,179), (96, -90), (48, -138), (12, -174), (30, -156)]

I use split method, but it doesn't give me what I want.

data.split("\n")

['365\t179\r', '96\t-90\r', '48\t-138\r', '12\t-174\r', '30\t-156\r', '']
>>> s = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'
>>> [map(int, x.split('\t')) for x in s.rstrip().split('\r\n')]
[[365, 179], [96, -90], [48, -138], [12, -174], [30, -156]]

Using the code from my other answer , you can also handle other types as well:

from ast import literal_eval
def solve(x):
    try:
        return literal_eval(x)
    except (ValueError, SyntaxError):
        return x

s = '365\tFoo\r\nBar\t-90.01\r\n48\tspam\r\n12e10\t-174\r\n30\t-156\r\n'
print [map(solve, x.split('\t')) for x in s.rstrip().split('\r\n')]
#[[365, 'Foo'], ['Bar', -90.01], [48, 'spam'], [120000000000.0, -174], [30, -156]]

There's actually a str.splitlines method which will split the string by line breaks, regardless of which line breaks are used. So this works on Unix systems with just an \\n , on Windows with \\r\\n and even on old Mac systems where the line break was just an \\r .

>>> s = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'
>>> s.splitlines()
['365\t179', '96\t-90', '48\t-138', '12\t-174', '30\t-156']

Once you have this result, you can split by tabs to get the individual cells. So you essentially have to call cell.split('\\t') on each cell. This is best done with a list comprehension:

>>> [row.split('\t') for row in s.splitlines()]
[['365', '179'], ['96', '-90'], ['48', '-138'], ['12', '-174'], ['30', '-156']]

As an alternative, you could also use map to apply the splitting operation on each cell:

>>> list(map(lambda cell: cell.split('\t'), s.splitlines()))
[['365', '179'], ['96', '-90'], ['48', '-138'], ['12', '-174'], ['30', '-156']]

As the copied data in the clipboard will always have the rows separated by newlines, and the columns separated by tabs, this solution is also safe to use for any range of cells you copied.

If you further want to convert integers or float to its correct datatypes in Python, I guess you could add some more conversion logic by calling int() on all cells that only have digits in them, float() on all cells that have digits and the dot in them . , leaving the rest as strings:

>>> def convert (cell):
        try:
            return int(cell)
        except ValueError:
            try:
                return float(cell)
            except ValueError:
                return cell
>>> [tuple(map(convert, row.split('\t'))) for row in s.splitlines()]
[(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

For a different string:

>>> s = 'Foo\tbar\r\n123.45\t42\r\n-85\t3.14'
>>> [tuple(map(convert, row.split('\t'))) for row in s.splitlines()]
[('Foo', 'bar'), (123.45, 42), (-85, 3.14)]
d = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'
print [tuple(map(int,item.split(","))) for item in d.replace("\t", ",").split()]

Output

[(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]
In [85]: zip(*[iter(map(int, data.split()))]*2)
Out[85]: [(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

Here is a breakdown of how it works:

First, split on whitespaces:

In [86]: data.split()
Out[86]: ['365', '179', '96', '-90', '48', '-138', '12', '-174', '30', '-156']

Convert the strings into ints:

In [87]: map(int, data.split())
Out[87]: [365, 179, 96, -90, 48, -138, 12, -174, 30, -156]

Use the grouper recipe to group every 2 items:

In [88]: zip(*[iter(map(int, data.split()))]*2)
Out[88]: [(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]
[line.split() for line in my_str.split("\n")]

This just breaks the data into lines and then splits it according to white space. Check it out and modify according to your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM