简体   繁体   English

如何在列表列表中按空格和引号拆分元素?

[英]How can I split elements by space and quotation within a list of list?

I have a list of data df1 , that consists of three lists strings:我有一个数据列表df1 ,它由三个列表字符串组成:

df1 = [
    ['1 "P040" 68.13 "P040_1" 2.55 8'],
    ['2 "P040" 46.82 "P040_2" 2.53 8'],
    ['3 "P040" 46.82 "P040_3" 2.51 8']
]

I want to convert it to the following list of lists df2 , without the double quotation marks ( " ):我想将其转换为以下列表列表df2 ,不带双引号 ( " ):

df2 = [
    ['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
    ['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
    ['3', 'P040', '46.82', 'P040_3', '2.51', '8']
]

I tried the following but does not work well我尝试了以下但效果不佳

for row in df1:
    for elem in row:
        elem.strip().split('"')
        elem.strip().split('"')

Here is a simple way to do this, you can replace unwanted quotes and then split by a space to get a list as result using a list comprehension:这是执行此操作的一种简单方法,您可以替换不需要的引号,然后用空格分隔以使用列表理解得到列表结果:

df2 = [''.join(row).replace('"', '').split(" ") for row in df1]

print(df2)

Output: Output:

[['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
['3', 'P040', '46.82', 'P040_3', '2.51', '8']]
df2 = []
for row in df1:
    for elem in row:
        df2.append(elem.replace('"', '').split(' '))

Since every list in the list consist of only one element, you don't really have to run through 2 for loops.由于列表中的每个列表仅包含一个元素,因此您实际上不必运行 2 个for循环。 This can be solved with an one-liner list comprehension:这可以通过单行列表理解来解决:

df1 = [
['1 "P040" 68.13 "P040_1" 2.55 8'],
['2 "P040" 46.82 "P040_2" 2.53 8'],
['3 "P040" 46.82 "P040_3" 2.51 8']
]

df2 = [row[0].replace('"', '').split(' ') for row in df1]

print(df2)

>>> [['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
     ['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
     ['3', 'P040', '46.82', 'P040_3', '2.51', '8']]

You can achieve it with nested list comprehension:您可以通过嵌套列表理解来实现它:

df2 = [[item.strip('""') for item in elem.split()] for row in df1 for elem in row]

Or nested loop with list comprehension:或具有列表理解的嵌套循环:

df2 = []
for row in df1:
    for elem in row:
        df2.append([item.strip('""') for item in elem.split()])

Output: Output:

['1', 'P040', '68.13', 'P040_1', '2.55', '8']
['2', 'P040', '46.82', 'P040_2', '2.53', '8']
['3', 'P040', '46.82', 'P040_3', '2.51', '8']
df1 = [['1 "P040" 68.13 "P040_1" 2.55 8'],
       ['2 "P040" 46.82 "P040_2" 2.53 8'],
       ['3 "P040" 46.82 "P040_3" 2.51 8']]

df1 = [[v.replace(' ','').split('"') for v in l] for l in df1]
print(df1)

You can use a combination of the split() and strip() functions to split on both quotes and spaces.您可以结合使用split()strip()函数来拆分引号和空格。

This example will split the elements in df1 and create a new df2 list:此示例将拆分df1中的元素并创建一个新的df2列表:

df2 = []
for row in df1:
    new_row = []
    for elem in row[0].split('"'):
        new_row.extend(elem.strip().split())
    df2.append(new_row)

print(df2)

You can use shlex.split() to split on 'words', where a word might be a quoted string, as follows:您可以使用 shlex.split() 拆分“单词”,其中单词可能是带引号的字符串,如下所示:

import shlex

for i in range(len(df1)):
    df1[i] = shlex.split(df1[i][0])

This is assuming that each of your list items is always one list containing the string.这是假设您的每个列表项始终是一个包含字符串的列表。 This modifies df1 in place, to create a new 'df2', it would be:这将修改 df1,以创建一个新的“df2”,它将是:

import shlex

df2 = []
for row in df1:
    df2.append(shlex.split(df1[i][0]))

The output will be: output 将是:

[
  ['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
  ['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
  ['3', 'P040', '46.82', 'P040_3', '2.51', '8']
]

The advantage of using shlex.split() is that the quoted strings can contain spaces without creating a problem that a plain 'split()' won't solve.使用 shlex.split() 的优点是引用的字符串可以包含空格,而不会产生普通“split()”无法解决的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM