在 dataframe 中對列進行切片，每行中的字符數不同（Python）

Question

我要切片的列如下所示：

{'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']}

我想將x和y值拉到它們自己的列中，如下所示：

{'name':['A', 'B', 'C'], 'x':[31.33, 9.33, -12.67], 'y':[19.98,6.98,-30.02]}

我假設我需要做一些切片，但我不確定如何 go 關於它。 謝謝。

Answer 1

您可以為此使用正則表達式：

import re

d = {'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']} 

x = [re.search(r'x=((?:\-)?\d+(?:\.\d+))', x).group(1) for x in d['location']]
y = [re.search(r'y=((?:\-)?\d+(?:\.\d+))', x).group(1) for x in d['location']]

res = {
    'name': d['name'],
    'x': list(map(float, x)),
    'y': list(map(float, y))
}

print(res)
# {'name': ['A', 'B', 'C'], 'x': [31.33, 9.33, -12.67], 'y': [19.98, 6.98, -30.02]}

如果您非常確定您的數據始終遵循這種模式，您可以將上述正則表達式簡化為：

x = [re.search(r'x=(.*) ', x).group(1) for x in d['location']]
y = [re.search(r'y=(.*)\)', x).group(1) for x in d['location']]

Answer 2

這是一個解決方案：

start = {
    'name':['A', 'B', 'C'],
    'location':['(x=31.33 y=19.98)',
    '(x=9.33 y=6.98)',
    '(x=-12.67 y=-30.02)']
    } 

xList = []
yList =  []
for string in start['location']:
    splitted = string[1:-1].split(" ")
    x = splitted[0].split("=")[1]
    y = splitted[1].split("=")[1]
    xList.append(x)
    yList.append(y)

end = {
    'name' : start['name'],
    'x' : xList,
    'y' : yList
}

print(end)

您還可以使用正則表達式匹配字符串中的模式（文檔、正則表達式測試網站）

編輯：

這是一個帶有正則表達式的解決方案，更優雅：


import re
start = {
    'name':['A', 'B', 'C'],
    'location':['(x=31.33 y=19.98)',
    '(x=9.33 y=6.98)',
    '(x=-12.67 y=-30.02)']
    } 

end = {
    'name' : start['name'],
    'x' : [],
    'y' : []
}

for string in start['location']:
    checkNumber = re.compile("([\d]+[.]*[\d]*)")
    numbers = checkNumber.findall(string)
    end['x'].append(numbers[0])
    end['y'].append(numbers[1])


print(end)

你可以在這里測試正則表達式

Answer 3

您可以使用 re 庫（和列表推導）更優雅地做到這一點。

import re 

data = {'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']}

data['x'] = [float(re.split("=| |\)", i)[1]) for i in data['location']]
data['y'] = [float(re.split("=| |\)", i)[3]) for i in data['location']]

del(data['location'])

data
>>> {'name': ['A', 'B', 'C'],
'x': [31.33, 9.33, -12.67],
'y': [19.98, 6.98, -30.02]}

Answer 4

您需要解析字符串：

import pandas as pd
import re

t = {'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']} 

res = pd.DataFrame({'name':t['name'], 'x':[float(re.search("\(x=(.*) y", i).group(1)) for i in t['location']], 'y':[float(re.search("y=(.*)\)", i).group(1)) for i in t['location']]})

Answer 5

最簡單的方法是使用 `pandas.Series.str.extract()' 創建新列，即：

df = pd.DataFrame(["{'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']}"])
df.location.str.extract(r'x=(?P<x>[0-9.-]+) y=(?P<y>[0-9.-]+)', expand=True)

Output：

        x       y
0   31.33   19.98
1    9.33    6.98
2  -12.67  -30.02

如果您需要在現有 dataframe 中保存新列，您可以使用pd.concat() ，即：

df = pd.concat([df, df.location.str.extract(r'x=(?P<x>[0-9.-]+) y=(?P<y>[0-9.-]+)', expand=True)], axis=1)

在 dataframe 中對列進行切片，每行中的字符數不同（Python）

問題描述

5 個解決方案

解決方案1
1 2020-06-11 15:46:08

解決方案2
0 2020-06-11 15:41:07

解決方案3
0 2020-06-11 15:45:24

解決方案4
0 2020-06-11 15:53:14

解決方案5
0 2020-06-11 16:28:05

在 dataframe 中對列進行切片，每行中的字符數不同（Python）

問題描述

5 個解決方案

解決方案1 1 2020-06-11 15:46:08

解決方案2 0 2020-06-11 15:41:07

解決方案3 0 2020-06-11 15:45:24

解決方案4 0 2020-06-11 15:53:14

解決方案5 0 2020-06-11 16:28:05

解決方案1
1 2020-06-11 15:46:08

解決方案2
0 2020-06-11 15:41:07

解決方案3
0 2020-06-11 15:45:24

解決方案4
0 2020-06-11 15:53:14

解決方案5
0 2020-06-11 16:28:05