相当于粘贴 R 到 Python

Question

I am a new python afficionado.我是新的 python 爱好者。 For R users, there is one function: paste that helps to concatenate two or more variables in a dataframe. It's very useful.对于 R 用户，有一个 function: 粘贴可以帮助将两个或多个变量连接到 dataframe 中。它非常有用。 For example Suppose that I have this dataframe:例如假设我有这个 dataframe：

   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0

I want to create a new column which uses another column in a dataframe and some text.我想创建一个新列，它使用 dataframe 中的另一列和一些文本。 With R, I do:使用 R，我会：

> y$thecolThatIWant=ifelse(y$tarifMin!=-1,
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"C  partir de",y$tarifMin,"€uros"),
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"sans prix indique"))

And the result is:结果是：

> y
   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0           
                                                thecolThatIWant
1  Evenement permanent  --> zoo,  Aquar C  partir de  0.0 €uros
2  Evenement permanent  --> zoo,  Aquar C  partir de  4.5 €uros
6  Evenement permanent  --> lieu  Jardi C  partir de  0.0 €uros
7  Evenement permanent  --> lieu  Bois  C  partir de  0.0 €uros
13 Evenement permanent  --> espac Canal C  partir de  0.0 €uros
14 Evenement permanent  --> espac Canal C  partir de -1.0 €uros
15 Evenement permanent  --> parc  Le Ma C  partir de 20.0 €uros

My question is: How can I do the same thing in Python Pandas or some other module?我的问题是：如何在 Python Pandas 或其他模块中做同样的事情？

What I've tried so far: Well, I'm a very new user.到目前为止我已经尝试过：好吧，我是一个非常新的用户。 So sorry for my mistake.很抱歉我的错误。 I try to replicate the example in Python and we suppose that I get something like this我尝试复制 Python 中的示例，我们假设我得到这样的结果

table=pd.read_csv("y.csv",sep=",")
tt= table.loc[:,['categorie','titre','tarifMin','long','lat','lieu']]
table
ategorie    titre   tarifMin    long    lat     lieu
0   zoo,    Aquar   0.0     2.385   48.89   Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois
4   espac   Canal   0.0     2.366   48.87   Canal
5   espac   Canal   -1.0    2.384   48.89   Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma

I tried this basically我基本上试过了

sc="Even permanent -->" + " "+ tt.titre+" "+tt.lieu
tt['theColThatIWant'] = sc
tt

And I got this我得到了这个

    categorie   titre   tarifMin    long    lat     lieu    theColThatIWant
0   zoo,    Aquar   0.0     2.385   48.89   Aquar   Even permanent --> Aquar Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar   Even permanent --> Aquar Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi   Even permanent --> Jardi Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois    Even permanent --> Bois Bois
4   espac   Canal   0.0     2.366   48.87   Canal   Even permanent --> Canal Canal
5   espac   Canal   -1.0    2.384   48.89   Canal   Even permanent --> Canal Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma   Even permanent --> Le Ma Le Ma

Now, I suppose that I have to loop with condition if there is no vectorize like in R?现在，我想如果没有像 R 那样的向量化，我必须使用条件循环？

Answer 1

This very much works like Paste command in R: R code:这非常类似于 R: R 代码中的 Paste 命令：

 words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
 paste(words,collapse="|")

[1] [1]

"Here|I|want|to|concatenate|words|using|pipe|delimeter" “这里|我|想要|要|连接|单词|使用|管道|分隔符”

Python: Python：

words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
"|".join(words)

Result:结果：

'Here|I|want|to|concatenate|words|using|pipe|delimeter' '这里|我|想要|要|连接|单词|使用|管道|分隔符'

Answer 2

Here's a simple implementation that works on lists, and probably other iterables.这是一个适用于列表的简单实现，可能还有其他可迭代对象。 Warning: it's only been lightly tested, and only in Python 3.5+:警告：它只是经过轻微测试，并且仅在 Python 3.5+ 中：

from functools import reduce

def _reduce_concat(x, sep=""):
    return reduce(lambda x, y: str(x) + sep + str(y), x)
        
def paste(*lists, sep=" ", collapse=None):
    result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
    if collapse is not None:
        return _reduce_concat(result, sep=collapse)
    return list(result)

assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'

You can also have some more fun and replicate other functions like paste0 :您还可以享受更多乐趣并复制其他功能，例如paste0 ：

from functools import partial

paste0 = partial(paste, sep="")

Edit: here's a Repl.it project with type-annotated versions of this code.编辑：这是一个带有此代码类型注释版本的Repl.it 项目。

Answer 3

For this particular case, the paste operator in R is closest to Python's format which was added in Python 2.6.对于这种特殊情况， R的paste运算符最接近 Python 2.6 中添加的 Python format 。 It's newer and somewhat more flexible than the older % operator.它比旧的%运算符更新并且更灵活。

For a purely Python-ic answer without using numpy or pandas, here is one way to do it using your original data in the form of a list of lists (this could also have been done as a list of dict, but that seemed more cluttered to me).对于不使用 numpy 或 Pandas 的纯 Python ic 答案，这是使用列表列表形式的原始数据的一种方法（这也可以作为 dict 列表完成，但这似乎更混乱对我来说）。

# -*- coding: utf-8 -*-
names=['categorie','titre','tarifMin','lieu','long','lat','img','dateSortie']

records=[[
    'zoo',   'Aquar',     0.0,'Aquar',2.385,48.89,'ilo',0],[
    'zoo',   'Aquar',     4.5,'Aquar',2.408,48.83,'ilo',0],[
    'lieu',  'Jardi',     0.0,'Jardi',2.320,48.86,'ilo',0],[
    'lieu',  'Bois',      0.0,'Bois', 2.455,48.82,'ilo',0],[
    'espac', 'Canal',     0.0,'Canal',2.366,48.87,'ilo',0],[
    'espac', 'Canal',    -1.0,'Canal',2.384,48.89,'ilo',0],[
    'parc',  'Le Ma',    20.0,'Le Ma', 2.353,48.87,'ilo',0] ]

def prix(p):
    if (p != -1):
        return 'C  partir de {} €uros'.format(p)
    return 'sans prix indique'

def msg(a):
    return 'Evenement permanent  --> {}, {} {}'.format(a[0],a[1],prix(a[2]))

[m.append(msg(m)) for m in records]

from pprint import pprint

pprint(records)

The result is this:结果是这样的：

[['zoo',
  'Aquar',
  0.0,
  'Aquar',
  2.385,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 0.0 \xe2\x82\xacuros'],
 ['zoo',
  'Aquar',
  4.5,
  'Aquar',
  2.408,
  48.83,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 4.5 \xe2\x82\xacuros'],
 ['lieu',
  'Jardi',
  0.0,
  'Jardi',
  2.32,
  48.86,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Jardi C  partir de 0.0 \xe2\x82\xacuros'],
 ['lieu',
  'Bois',
  0.0,
  'Bois',
  2.455,
  48.82,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Bois C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  0.0,
  'Canal',
  2.366,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  -1.0,
  'Canal',
  2.384,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal sans prix indique'],
 ['parc',
  'Le Ma',
  20.0,
  'Le Ma',
  2.353,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> parc, Le Ma C  partir de 20.0 \xe2\x82\xacuros']]

Note that although I've defined a list names it isn't actually used.请注意，虽然我定义了一个列表names但实际上并没有使用它。 One could define a dictionary with the names of the titles as the key and the field number (starting from 0) as the value, but I didn't bother with this to try to keep the example simple.可以定义一个字典，以标题的名称作为键，以字段编号（从 0 开始）作为值，但我没有费心去尝试使示例保持简单。

The functions prix and msg are fairly simple. prix和msg函数相当简单。 The only tricky portion is the list comprehension [m.append(msg(m)) for m in records] which iterates through all of the records, and modifies each to append your new field, created via a call to msg .唯一棘手的部分是列表理解[m.append(msg(m)) for m in records] ，它遍历所有记录，并修改每个记录以附加您的新字段，该字段是通过调用msg创建的。

Answer 4

my anwser is loosely based on original question, was edited from answer by woles.我的 anwser 大致基于原始问题，是根据 woles 的回答编辑的。 I would like to illustrate the points:我想说明以下几点：

paste is % operator in python粘贴是python中的%运算符
using apply you can make new value and assign it to new column使用 apply 您可以创建新值并将其分配给新列

for R folks: there is no ifelse in direct form (but there are ways to nicely replace it).对于 R 人员：没有直接形式的 ifelse（但有一些方法可以很好地替换它）。

import numpy as np
import pandas as pd

dates = pd.date_range('20140412',periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']

def apply_to_row(x):
    ret = "this is the value i want: %f" % x['A']
    if x['B'] > 0:
        ret = "no, this one is better: %f" % x['C']
    return ret

df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
print df

Answer 5

You can try pandas.Series.str.cat你可以试试pandas.Series.str.cat

 import pandas as pd def paste0(ss,sep=None,na_rep=None,): '''Analogy to R paste0''' ss = [pd.Series(s) for s in ss] ss = [s.astype(str) for s in ss] s = ss[0] res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep) return res pasteA=paste0

Or just sep.join()或者只是sep.join()

# #

 def paste0(ss,sep=None,na_rep=None, castF=unicode, ##### many languages dont work well with str ): if sep is None: sep='' res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)] return res pasteB = paste0 %timeit pasteA([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 7.11 ms per loop %timeit pasteB([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 2.24 ms per loop

I have used itertools to mimic recycling我使用itertools来模拟回收

import itertools def paste0(ss,sep=None,na_rep=None,castF=unicode): '''Analogy to R paste0 ''' if sep is None: sep=u'' L = max([len(e) for e in ss]) it = itertools.izip(*[itertools.cycle(e) for e in ss]) res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)] # res = pd.Series(res) return res

patsy might be relevant (not an experienced user myself.) patsy可能是相关的（我自己不是有经验的用户。）

Answer 6

Let's try things with apply.让我们用apply试试。

df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )

you will recevied things similar like paste你会收到类似粘贴的东西

Answer 7

If you want to just paste two string columns together, you can simplify @shouldsee's answer because you don't need to create the function.如果您只想将两个字符串列粘贴在一起，您可以简化@shouldsee 的答案，因为您不需要创建该函数。 Eg, in my case:例如，就我而言：

df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')

It might be required for both Series to be of dtype object in order to this (I haven't verified).为此，可能需要两个系列都是 dtype object （我尚未验证）。

Answer 8

This is simple example how to achive that (If I'am not worng what do you want to do):这是如何实现这一目标的简单示例（如果我不知道你想做什么）：

import numpy as np
import pandas as pd

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
for row in df.itertuples():
    index, A, B, C, D = row
    print '%s Evenement permanent  --> %s , next data %s' % (index, A, B)

Output:输出：

>>>df
                   A         B         C         D
2013-01-01 -0.400550 -0.204032 -0.954237  0.019025
2013-01-02  0.509040 -0.611699  1.065862  0.034486
2013-01-03  0.366230  0.805068 -0.144129 -0.912942
2013-01-04  1.381278 -1.783794  0.835435 -0.140371
2013-01-05  1.140866  2.755003 -0.940519 -2.425671
2013-01-06 -0.610569 -0.282952  0.111293 -0.108521

This what loop for print: 2013-01-01 00:00:00 Evenement permanent --> -0.400550121168 , next data -0.204032344442这是什么循环打印： 2013-01-01 00:00:00 Evenement 永久 --> -0.400550121168 ，下一个数据 -0.204032344442

2013-01-02 00:00:00 Evenement permanent  --> 0.509040318928 , next data -0.611698560541

2013-01-03 00:00:00 Evenement permanent  --> 0.366230438863 , next data 0.805067758304

2013-01-04 00:00:00 Evenement permanent  --> 1.38127775713 , next data -1.78379439485

2013-01-05 00:00:00 Evenement permanent  --> 1.14086631509 , next data 2.75500268167

2013-01-06 00:00:00 Evenement permanent  --> -0.610568516983 , next data -0.282952162792

Answer 9

There is actually a very easy way.其实有一个非常简单的方法。 You just convert your variable to a string .您只需将变量转换为string 。 For instance, try to run this:例如，尝试运行这个：

a = 1; b = "you are number " + str(a); b

相当于粘贴 R 到 Python

问题描述

9 个解决方案

解决方案1
40 2017-11-26 06:38:12

解决方案2
18 2016-03-02 19:17:57

解决方案3
5 2014-01-22 21:36:49

解决方案4
2 2014-01-22 21:05:23

解决方案5
2 2018-05-31 09:16:42

解决方案6
1 2017-12-13 05:43:27

解决方案7
1 2018-12-21 18:57:12

解决方案8
0 2014-01-22 20:47:55

解决方案9
0 2021-10-08 10:55:15

相当于粘贴 R 到 Python

问题描述

9 个解决方案

解决方案1 40 2017-11-26 06:38:12

解决方案2 18 2016-03-02 19:17:57

解决方案3 5 2014-01-22 21:36:49

解决方案4 2 2014-01-22 21:05:23

解决方案5 2 2018-05-31 09:16:42

解决方案6 1 2017-12-13 05:43:27

解决方案7 1 2018-12-21 18:57:12

解决方案8 0 2014-01-22 20:47:55

解决方案9 0 2021-10-08 10:55:15

解决方案1
40 2017-11-26 06:38:12

解决方案2
18 2016-03-02 19:17:57

解决方案3
5 2014-01-22 21:36:49

解决方案4
2 2014-01-22 21:05:23

解决方案5
2 2018-05-31 09:16:42

解决方案6
1 2017-12-13 05:43:27

解决方案7
1 2018-12-21 18:57:12

解决方案8
0 2014-01-22 20:47:55

解决方案9
0 2021-10-08 10:55:15