简体   繁体   English

相当于粘贴 R 到 Python

[英]Equivalent of Paste R to Python

I am a new python afficionado.我是新的 python 爱好者。 For R users, there is one function: paste that helps to concatenate two or more variables in a dataframe. It's very useful.对于 R 用户,有一个 function: 粘贴可以帮助将两个或多个变量连接到 dataframe 中。它非常有用。 For example Suppose that I have this dataframe:例如假设我有这个 dataframe:

   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0 

I want to create a new column which uses another column in a dataframe and some text.我想创建一个新列,它使用 dataframe 中的另一列和一些文本。 With R, I do:使用 R,我会:

> y$thecolThatIWant=ifelse(y$tarifMin!=-1,
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"C  partir de",y$tarifMin,"€uros"),
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"sans prix indique"))

And the result is:结果是:

> y
   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0           
                                                thecolThatIWant
1  Evenement permanent  --> zoo,  Aquar C  partir de  0.0 €uros
2  Evenement permanent  --> zoo,  Aquar C  partir de  4.5 €uros
6  Evenement permanent  --> lieu  Jardi C  partir de  0.0 €uros
7  Evenement permanent  --> lieu  Bois  C  partir de  0.0 €uros
13 Evenement permanent  --> espac Canal C  partir de  0.0 €uros
14 Evenement permanent  --> espac Canal C  partir de -1.0 €uros
15 Evenement permanent  --> parc  Le Ma C  partir de 20.0 €uros

My question is: How can I do the same thing in Python Pandas or some other module?我的问题是:如何在 Python Pandas 或其他模块中做同样的事情?

What I've tried so far: Well, I'm a very new user.到目前为止我已经尝试过:好吧,我是一个非常新的用户。 So sorry for my mistake.很抱歉我的错误。 I try to replicate the example in Python and we suppose that I get something like this我尝试复制 Python 中的示例,我们假设我得到这样的结果

table=pd.read_csv("y.csv",sep=",")
tt= table.loc[:,['categorie','titre','tarifMin','long','lat','lieu']]
table
ategorie    titre   tarifMin    long    lat     lieu
0   zoo,    Aquar   0.0     2.385   48.89   Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois
4   espac   Canal   0.0     2.366   48.87   Canal
5   espac   Canal   -1.0    2.384   48.89   Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma

I tried this basically我基本上试过了

sc="Even permanent -->" + " "+ tt.titre+" "+tt.lieu
tt['theColThatIWant'] = sc
tt

And I got this我得到了这个

    categorie   titre   tarifMin    long    lat     lieu    theColThatIWant
0   zoo,    Aquar   0.0     2.385   48.89   Aquar   Even permanent --> Aquar Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar   Even permanent --> Aquar Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi   Even permanent --> Jardi Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois    Even permanent --> Bois Bois
4   espac   Canal   0.0     2.366   48.87   Canal   Even permanent --> Canal Canal
5   espac   Canal   -1.0    2.384   48.89   Canal   Even permanent --> Canal Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma   Even permanent --> Le Ma Le Ma

Now, I suppose that I have to loop with condition if there is no vectorize like in R?现在,我想如果没有像 R 那样的向量化,我必须使用条件循环?

This very much works like Paste command in R: R code:这非常类似于 R: R 代码中的 Paste 命令:

 words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
 paste(words,collapse="|")

[1] [1]

"Here|I|want|to|concatenate|words|using|pipe|delimeter" “这里|我|想要|要|连接|单词|使用|管道|分隔符”

Python: Python:

words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
"|".join(words)

Result:结果:

'Here|I|want|to|concatenate|words|using|pipe|delimeter' '这里|我|想要|要|连接|单词|使用|管道|分隔符'

Here's a simple implementation that works on lists, and probably other iterables.这是一个适用于列表的简单实现,可能还有其他可迭代对象。 Warning: it's only been lightly tested, and only in Python 3.5+:警告:它只是经过轻微测试,并且仅在 Python 3.5+ 中:

from functools import reduce

def _reduce_concat(x, sep=""):
    return reduce(lambda x, y: str(x) + sep + str(y), x)
        
def paste(*lists, sep=" ", collapse=None):
    result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
    if collapse is not None:
        return _reduce_concat(result, sep=collapse)
    return list(result)

assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'

You can also have some more fun and replicate other functions like paste0 :您还可以享受更多乐趣并复制其他功能,例如paste0

from functools import partial

paste0 = partial(paste, sep="")

Edit: here's a Repl.it project with type-annotated versions of this code.编辑:这是一个带有此代码类型注释版本的Repl.it 项目

For this particular case, the paste operator in R is closest to Python's format which was added in Python 2.6.对于这种特殊情况, Rpaste运算符最接近 Python 2.6 中添加的 Python format It's newer and somewhat more flexible than the older % operator.它比旧的%运算符更新并且更灵活。

For a purely Python-ic answer without using numpy or pandas, here is one way to do it using your original data in the form of a list of lists (this could also have been done as a list of dict, but that seemed more cluttered to me).对于不使用 numpy 或 Pandas 的纯 Python ic 答案,这是使用列表列表形式的原始数据的一种方法(这也可以作为 dict 列表完成,但这似乎更混乱对我来说)。

# -*- coding: utf-8 -*-
names=['categorie','titre','tarifMin','lieu','long','lat','img','dateSortie']

records=[[
    'zoo',   'Aquar',     0.0,'Aquar',2.385,48.89,'ilo',0],[
    'zoo',   'Aquar',     4.5,'Aquar',2.408,48.83,'ilo',0],[
    'lieu',  'Jardi',     0.0,'Jardi',2.320,48.86,'ilo',0],[
    'lieu',  'Bois',      0.0,'Bois', 2.455,48.82,'ilo',0],[
    'espac', 'Canal',     0.0,'Canal',2.366,48.87,'ilo',0],[
    'espac', 'Canal',    -1.0,'Canal',2.384,48.89,'ilo',0],[
    'parc',  'Le Ma',    20.0,'Le Ma', 2.353,48.87,'ilo',0] ]

def prix(p):
    if (p != -1):
        return 'C  partir de {} €uros'.format(p)
    return 'sans prix indique'

def msg(a):
    return 'Evenement permanent  --> {}, {} {}'.format(a[0],a[1],prix(a[2]))

[m.append(msg(m)) for m in records]

from pprint import pprint

pprint(records)

The result is this:结果是这样的:

[['zoo',
  'Aquar',
  0.0,
  'Aquar',
  2.385,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 0.0 \xe2\x82\xacuros'],
 ['zoo',
  'Aquar',
  4.5,
  'Aquar',
  2.408,
  48.83,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 4.5 \xe2\x82\xacuros'],
 ['lieu',
  'Jardi',
  0.0,
  'Jardi',
  2.32,
  48.86,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Jardi C  partir de 0.0 \xe2\x82\xacuros'],
 ['lieu',
  'Bois',
  0.0,
  'Bois',
  2.455,
  48.82,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Bois C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  0.0,
  'Canal',
  2.366,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  -1.0,
  'Canal',
  2.384,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal sans prix indique'],
 ['parc',
  'Le Ma',
  20.0,
  'Le Ma',
  2.353,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> parc, Le Ma C  partir de 20.0 \xe2\x82\xacuros']]

Note that although I've defined a list names it isn't actually used.请注意,虽然我定义了一个列表names但实际上并没有使用它。 One could define a dictionary with the names of the titles as the key and the field number (starting from 0) as the value, but I didn't bother with this to try to keep the example simple.可以定义一个字典,以标题的名称作为键,以字段编号(从 0 开始)作为值,但我没有费心去尝试使示例保持简单。

The functions prix and msg are fairly simple. prixmsg函数相当简单。 The only tricky portion is the list comprehension [m.append(msg(m)) for m in records] which iterates through all of the records, and modifies each to append your new field, created via a call to msg .唯一棘手的部分是列表理解[m.append(msg(m)) for m in records] ,它遍历所有记录,并修改每个记录以附加您的新字段,该字段是通过调用msg创建的。

my anwser is loosely based on original question, was edited from answer by woles.我的 anwser 大致基于原始问题,是根据 woles 的回答编辑的。 I would like to illustrate the points:我想说明以下几点:

  • paste is % operator in python粘贴是python中的%运算符
  • using apply you can make new value and assign it to new column使用 apply 您可以创建新值并将其分配给新列

for R folks: there is no ifelse in direct form (but there are ways to nicely replace it).对于 R 人员:没有直接形式的 ifelse(但有一些方法可以很好地替换它)。

import numpy as np
import pandas as pd

dates = pd.date_range('20140412',periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']

def apply_to_row(x):
    ret = "this is the value i want: %f" % x['A']
    if x['B'] > 0:
        ret = "no, this one is better: %f" % x['C']
    return ret

df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
print df
  1. You can try pandas.Series.str.cat你可以试试pandas.Series.str.cat

     import pandas as pd def paste0(ss,sep=None,na_rep=None,): '''Analogy to R paste0''' ss = [pd.Series(s) for s in ss] ss = [s.astype(str) for s in ss] s = ss[0] res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep) return res pasteA=paste0
  2. Or just sep.join()或者只是sep.join()

    # #
     def paste0(ss,sep=None,na_rep=None, castF=unicode, ##### many languages dont work well with str ): if sep is None: sep='' res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)] return res pasteB = paste0 %timeit pasteA([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 7.11 ms per loop %timeit pasteB([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 2.24 ms per loop
  3. I have used itertools to mimic recycling我使用itertools来模拟回收

    import itertools def paste0(ss,sep=None,na_rep=None,castF=unicode): '''Analogy to R paste0 ''' if sep is None: sep=u'' L = max([len(e) for e in ss]) it = itertools.izip(*[itertools.cycle(e) for e in ss]) res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)] # res = pd.Series(res) return res
  4. patsy might be relevant (not an experienced user myself.) patsy可能是相关的(我自己不是有经验的用户。)

Let's try things with apply.让我们用apply试试。

df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )

you will recevied things similar like paste你会收到类似粘贴的东西

If you want to just paste two string columns together, you can simplify @shouldsee's answer because you don't need to create the function.如果您只想将两个字符串列粘贴在一起,您可以简化@shouldsee 的答案,因为您不需要创建该函数。 Eg, in my case:例如,就我而言:

df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')

It might be required for both Series to be of dtype object in order to this (I haven't verified).为此,可能需要两个系列都是 dtype object (我尚未验证)。

This is simple example how to achive that (If I'am not worng what do you want to do):这是如何实现这一目标的简单示例(如果我不知道你想做什么):

import numpy as np
import pandas as pd

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
for row in df.itertuples():
    index, A, B, C, D = row
    print '%s Evenement permanent  --> %s , next data %s' % (index, A, B)

Output:输出:

>>>df
                   A         B         C         D
2013-01-01 -0.400550 -0.204032 -0.954237  0.019025
2013-01-02  0.509040 -0.611699  1.065862  0.034486
2013-01-03  0.366230  0.805068 -0.144129 -0.912942
2013-01-04  1.381278 -1.783794  0.835435 -0.140371
2013-01-05  1.140866  2.755003 -0.940519 -2.425671
2013-01-06 -0.610569 -0.282952  0.111293 -0.108521

This what loop for print: 2013-01-01 00:00:00 Evenement permanent --> -0.400550121168 , next data -0.204032344442这是什么循环打印: 2013-01-01 00:00:00 Evenement 永久 --> -0.400550121168 ,下一个数据 -0.204032344442

2013-01-02 00:00:00 Evenement permanent  --> 0.509040318928 , next data -0.611698560541

2013-01-03 00:00:00 Evenement permanent  --> 0.366230438863 , next data 0.805067758304

2013-01-04 00:00:00 Evenement permanent  --> 1.38127775713 , next data -1.78379439485

2013-01-05 00:00:00 Evenement permanent  --> 1.14086631509 , next data 2.75500268167

2013-01-06 00:00:00 Evenement permanent  --> -0.610568516983 , next data -0.282952162792

There is actually a very easy way.其实有一个非常简单的方法。 You just convert your variable to a string .您只需将变量转换为string For instance, try to run this:例如,尝试运行这个:

a = 1; b = "you are number " + str(a); b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM