[英]How can I split elements by space and quotation within a list of list?
I have a list of data df1
, that consists of three lists strings:我有一个数据列表df1
,它由三个列表字符串组成:
df1 = [
['1 "P040" 68.13 "P040_1" 2.55 8'],
['2 "P040" 46.82 "P040_2" 2.53 8'],
['3 "P040" 46.82 "P040_3" 2.51 8']
]
I want to convert it to the following list of lists df2
, without the double quotation marks ( "
):我想将其转换为以下列表列表df2
,不带双引号 ( "
):
df2 = [
['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
['3', 'P040', '46.82', 'P040_3', '2.51', '8']
]
I tried the following but does not work well我尝试了以下但效果不佳
for row in df1:
for elem in row:
elem.strip().split('"')
elem.strip().split('"')
Here is a simple way to do this, you can replace unwanted quotes and then split by a space to get a list as result using a list comprehension:这是执行此操作的一种简单方法,您可以替换不需要的引号,然后用空格分隔以使用列表理解得到列表结果:
df2 = [''.join(row).replace('"', '').split(" ") for row in df1]
print(df2)
Output: Output:
[['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
['3', 'P040', '46.82', 'P040_3', '2.51', '8']]
df2 = []
for row in df1:
for elem in row:
df2.append(elem.replace('"', '').split(' '))
Since every list in the list consist of only one element, you don't really have to run through 2 for
loops.由于列表中的每个列表仅包含一个元素,因此您实际上不必运行 2 个for
循环。 This can be solved with an one-liner list comprehension:这可以通过单行列表理解来解决:
df1 = [
['1 "P040" 68.13 "P040_1" 2.55 8'],
['2 "P040" 46.82 "P040_2" 2.53 8'],
['3 "P040" 46.82 "P040_3" 2.51 8']
]
df2 = [row[0].replace('"', '').split(' ') for row in df1]
print(df2)
>>> [['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
['3', 'P040', '46.82', 'P040_3', '2.51', '8']]
You can achieve it with nested list comprehension:您可以通过嵌套列表理解来实现它:
df2 = [[item.strip('""') for item in elem.split()] for row in df1 for elem in row]
Or nested loop with list comprehension:或具有列表理解的嵌套循环:
df2 = []
for row in df1:
for elem in row:
df2.append([item.strip('""') for item in elem.split()])
Output: Output:
['1', 'P040', '68.13', 'P040_1', '2.55', '8']
['2', 'P040', '46.82', 'P040_2', '2.53', '8']
['3', 'P040', '46.82', 'P040_3', '2.51', '8']
df1 = [['1 "P040" 68.13 "P040_1" 2.55 8'],
['2 "P040" 46.82 "P040_2" 2.53 8'],
['3 "P040" 46.82 "P040_3" 2.51 8']]
df1 = [[v.replace(' ','').split('"') for v in l] for l in df1]
print(df1)
You can use a combination of the split()
and strip()
functions to split on both quotes and spaces.您可以结合使用split()
和strip()
函数来拆分引号和空格。
This example will split the elements in df1
and create a new df2
list:此示例将拆分df1
中的元素并创建一个新的df2
列表:
df2 = []
for row in df1:
new_row = []
for elem in row[0].split('"'):
new_row.extend(elem.strip().split())
df2.append(new_row)
print(df2)
You can use shlex.split() to split on 'words', where a word might be a quoted string, as follows:您可以使用 shlex.split() 拆分“单词”,其中单词可能是带引号的字符串,如下所示:
import shlex
for i in range(len(df1)):
df1[i] = shlex.split(df1[i][0])
This is assuming that each of your list items is always one list containing the string.这是假设您的每个列表项始终是一个包含字符串的列表。 This modifies df1 in place, to create a new 'df2', it would be:这将修改 df1,以创建一个新的“df2”,它将是:
import shlex
df2 = []
for row in df1:
df2.append(shlex.split(df1[i][0]))
The output will be: output 将是:
[
['1', 'P040', '68.13', 'P040_1', '2.55', '8'],
['2', 'P040', '46.82', 'P040_2', '2.53', '8'],
['3', 'P040', '46.82', 'P040_3', '2.51', '8']
]
The advantage of using shlex.split() is that the quoted strings can contain spaces without creating a problem that a plain 'split()' won't solve.使用 shlex.split() 的优点是引用的字符串可以包含空格,而不会产生普通“split()”无法解决的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.