简体   繁体   English

如何在 Pandas 的查询内部使用带有引号和空格的变量(字符串)?

[英]How to use variables (strings) with quotes and whitespace at the end inside query in Pandas?

Let say that i have the following code假设我有以下代码

    comm='This is/a string with (some single quotes, inside like 'F' this)'
print(df.query('Column1==@comm')['Column2'].values[0])

This give me an error instead of return the value of Column2 when comm exist in Column1这给了我一个错误,而不是在Column1中存在comm时返回Column2的值

I also tried:我也试过:

df.query("Column1=='{0}'".format(comm))['Column2'].values[0]

Without luck as well.没有运气也是如此。

If the variable is a string without single ' or double " quotes, it works just fine.如果变量是一个没有单'或双"引号的字符串,它就可以正常工作。

In the actual code comm is a dynamic variable that changes and takes for values strings with single ' and double " quotes in.在实际代码中, comm是一个动态变量,它会更改并获取带有单'和双"引号的字符串。

Thanks in advance.提前致谢。

EDIT: It seems that pandas queries suffer from various other problems if the string contain symbols.编辑:如果字符串包含符号,pandas 查询似乎会遇到各种其他问题。

I tried and replaced as advised comm.replace("'","\\'") and worked for strings containing single quotes ' .我按照建议尝试并替换了comm.replace("'","\\'")并为包含单引号'的字符串工作。

Now im facing other problems where the query fail to find the string in the dataframe ( even though the string exists ) if the string contain whitespace at the end.现在我面临其他问题,如果字符串末尾包含空格,则查询无法在 dataframe 中找到字符串(即使字符串存在)。

comm='This is a test. string '
comm='This is a test string/ '

As I see your string contains both.我看到你的字符串包含两者。

No problem:没问题:

comm = "string with \" and ' in it!"

You can't write single quotes inside single quotes string because what you actually do is separating the string into two strings leaving syntax error您不能在单引号字符串中写单引号,因为您实际上所做的是将字符串分成两个字符串而留下语法错误

comm='This is/a string with (some single quotes, inside like ' +  F  + ' this)'

F is just a variable now not part of string F 只是一个变量,现在不是字符串的一部分

This lines of code works fine这行代码工作正常

df = pd.DataFrame({'Column1': ["string with 'single' quotes,inside like 'F' this", 'Data'],
    'Column2':['Done','Data2']})
    comm="string with 'single' quotes,inside like 'F' this"
    print(df.query('Column1==@comm')['Column2'].values[0])

edited:- You can use single quotes inside single quotes by write it prefixed by \'F\'编辑:-您可以在单引号内使用单引号,方法是在前面加上\'F\'

df = pd.DataFrame({'Column1': ['string with \'single\' quotes,inside like \'F\' this', 'Data'],
'Column2':['Done','Data2']})
comm='string with \'single\' quotes,inside like \'F\' this'
print(df.query('Column1==@comm')['Column2'].values[0])

This trick to make query work for double quotes format by changing it to format a little bit.这个技巧通过将其更改为格式来使查询适用于双引号格式。

import json 
def convert_string(string): #Function which change format to be '"<string>"'
    return json.dumps(string)

df = pd.DataFrame({'Column1': ['here', 'Data'],
'Column2':['Done','Data2']})
comm="here"
converted = convert_string(comm)
print(df.query('Column1=={}'.format(converted))['Column2'].values[0])

Better Solution is by using exception.更好的解决方案是使用异常。

df = pd.DataFrame({'Column1': ['here', 'Data'],
'Column2':['Done','Data2']})
comm="here"
try:
    print(df.query('Column1==@comm')['Column2'].values[0])
except:
    print(df.query("Column1==@comm")['Column2'].values[0])

edited - 2:编辑 - 2:

This Script for removing all symbols from data frame very fast.此脚本用于非常快速地从数据框中删除所有符号。

#Create random dataframe
import pandas as pd
import numpy as np
import random
import string

random.seed(0)

def random_String(Length=20):
    letters = string.ascii_lowercase + string.punctuation
    return ''.join(random.choice(letters) for i in range(Length))
data_shape = 100000
data = {'A':[random_String() for i in range(data_shape)],'B':['Here string {}'.format(i) for i in range(data_shape)]}
df = pd.DataFrame(data)

df.head()
Out[1]: 
                      A              B
0  {y[}!cq'&z]`t%w,~n'i  Here string 0
1  si[g.^q)>^-~jtg?e~{<  Here string 1
2  v%*gw"u./n*%#|(qd^*a  Here string 2
3  f?`z>_];/a.&_|vp?u>|  Here string 3
4  em+op^j^)#ffu}'&gt*s  Here string 4
def remove_symbols(s): #Function remove symbols from gived column
    return s.translate(str.maketrans('', '', string.punctuation))
def convert_data(pandas_series):
     return pandas_series.apply(remove_symbols)
df['A'] = convert_data(df['A'])
df.head()
Out[2]: 
             A              B
0     ycqztwni  Here string 0
1     sigqjtge  Here string 1
2     vgwunqda  Here string 2
3       fzavpu  Here string 3
4  emopjffugts  Here string 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM