简体   繁体   English

在 dataframe 中对列进行切片,每行中的字符数不同(Python)

[英]Slicing a column in a dataframe with varied number of characters in each row (Python)

The column I would like to slice looks like this:我要切片的列如下所示:

{'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']} 

I would like to pull the x and y values into their own columns to look like this:我想将xy值拉到它们自己的列中,如下所示:

{'name':['A', 'B', 'C'], 'x':[31.33, 9.33, -12.67], 'y':[19.98,6.98,-30.02]} 

I am assuming I need to do some slicing, but am unsure how to go about it.我假设我需要做一些切片,但我不确定如何 go 关于它。 Thanks.谢谢。

You can use regex for this:您可以为此使用正则表达式:

import re

d = {'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']} 

x = [re.search(r'x=((?:\-)?\d+(?:\.\d+))', x).group(1) for x in d['location']]
y = [re.search(r'y=((?:\-)?\d+(?:\.\d+))', x).group(1) for x in d['location']]

res = {
    'name': d['name'],
    'x': list(map(float, x)),
    'y': list(map(float, y))
}

print(res)
# {'name': ['A', 'B', 'C'], 'x': [31.33, 9.33, -12.67], 'y': [19.98, 6.98, -30.02]}

In case you are very sure about your data that they always follow this pattern, you can simplify above regex to:如果您非常确定您的数据始终遵循这种模式,您可以将上述正则表达式简化为:

x = [re.search(r'x=(.*) ', x).group(1) for x in d['location']]
y = [re.search(r'y=(.*)\)', x).group(1) for x in d['location']]

Here's a solution:这是一个解决方案:

start = {
    'name':['A', 'B', 'C'],
    'location':['(x=31.33 y=19.98)',
    '(x=9.33 y=6.98)',
    '(x=-12.67 y=-30.02)']
    } 

xList = []
yList =  []
for string in start['location']:
    splitted = string[1:-1].split(" ")
    x = splitted[0].split("=")[1]
    y = splitted[1].split("=")[1]
    xList.append(x)
    yList.append(y)

end = {
    'name' : start['name'],
    'x' : xList,
    'y' : yList
}

print(end)

You can also use regexes to match patterns in strings ( documentation , regex expressions testing website )您还可以使用正则表达式匹配字符串中的模式(文档正则表达式测试网站

EDIT:编辑:

Here's a solution with a regex, much more elegant:这是一个带有正则表达式的解决方案,更优雅:


import re
start = {
    'name':['A', 'B', 'C'],
    'location':['(x=31.33 y=19.98)',
    '(x=9.33 y=6.98)',
    '(x=-12.67 y=-30.02)']
    } 

end = {
    'name' : start['name'],
    'x' : [],
    'y' : []
}

for string in start['location']:
    checkNumber = re.compile("([\d]+[.]*[\d]*)")
    numbers = checkNumber.findall(string)
    end['x'].append(numbers[0])
    end['y'].append(numbers[1])


print(end)

You can test the regex here你可以在这里测试正则表达式

You can do this a little more elegantly with the re library (and list comprehensions).您可以使用 re 库(和列表推导)更优雅地做到这一点。

import re 

data = {'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']}

data['x'] = [float(re.split("=| |\)", i)[1]) for i in data['location']]
data['y'] = [float(re.split("=| |\)", i)[3]) for i in data['location']]

del(data['location'])

data
>>> {'name': ['A', 'B', 'C'],
'x': [31.33, 9.33, -12.67],
'y': [19.98, 6.98, -30.02]}

You need to parse the string:您需要解析字符串:

import pandas as pd
import re

t = {'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']} 

res = pd.DataFrame({'name':t['name'], 'x':[float(re.search("\(x=(.*) y", i).group(1)) for i in t['location']], 'y':[float(re.search("y=(.*)\)", i).group(1)) for i in t['location']]})


The easiest way is to create new columns using `pandas.Series.str.extract()', ie.:最简单的方法是使用 `pandas.Series.str.extract()' 创建新列,即:

df = pd.DataFrame(["{'name':['A', 'B', 'C'], 'location':['(x=31.33 y=19.98)', '(x=9.33 y=6.98)', '(x=-12.67 y=-30.02)']}"])
df.location.str.extract(r'x=(?P<x>[0-9.-]+) y=(?P<y>[0-9.-]+)', expand=True)

Output: Output:

        x       y
0   31.33   19.98
1    9.33    6.98
2  -12.67  -30.02

And if you need to save the new columns in the existing dataframe you can use pd.concat() , ie.:如果您需要在现有 dataframe 中保存新列,您可以使用pd.concat() ,即:

df = pd.concat([df, df.location.str.extract(r'x=(?P<x>[0-9.-]+) y=(?P<y>[0-9.-]+)', expand=True)], axis=1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Python数据框的列中的每一行中删除前x个字符 - Remove first x number of characters from each row in a column of a Python dataframe 使用Python从列的每一行中的字符串中切片子字符串 - Slicing the substring from the string in each row of the column using Python 将每个 dataframe 行切成 3 windows 不同的切片范围 - Slicing each dataframe row into 3 windows with different slicing ranges 如何从 pandas (python) dataframe 的每一行/列中删除最后几个字符? - How to remove last few characters from each row/column from pandas (python) dataframe? 如何通过 python 中 select 列右侧的每行中的 nan 数对 dataframe 进行子集化? - How to subset a dataframe by the number of nans in each row to the right of a select column in python? Python / Pandas —将行号转换为小时,然后切片 - Python/Pandas — Converting row number to hour and then slicing 根据单独数据框中的行和列值对数据框进行切片 - slicing a dataframe based on row and column values within a seperate dataframe Python Pandas 对列中的多个行值进行切片 - Python pandas slicing multiple row values in a column 在dataframe列的每一行中查找单词并添加一个新列 - Python - Find for a word in a each row of dataframe column and add a new column - Python 在 Python 中删除数据帧的每一列中的字符和单词重复项 - Removing characters and word duplicates within each column of a dataframe in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM