复制一些行并更改pandas中的某些值

Question

我有一个像这样的pandas DataFrame：

From    To    Val
GE      VD    1000
GE      VS    1600
VS      VD    1500
VS      GE     600
VD      GE    1200
VD      VS    1300

我想将“from”或“to”列中没有“GE”的每一行替换为两行，一行在“from”列中有“GE”，另一行在“to”中有“GE”。 “专栏。 在上面的例子中，我将用以下两行替换第三行：
GE VD 1500
VS GE 1500

我尝试使用“apply”但我无法弄清楚如何返回正确的数据框。 例如

def myfun(row):
    if "GE" not in (row["from"], row["to"]):
        row1=pd.DataFrame(row).T
        row2=row1.copy()
        row1["from"]="GE"
        row2["to"]="GE"
        return pd.concat([row1, row2])
    else:
        return pd.DataFrame(row).T

给出一个奇怪的结果：

>> df.apply(myfun, axis=1)
   Val  from  to
0  Val  from  to
1  Val  from  to
2  Val  from  to
3  Val  from  to
4  Val  from  to
5  Val  from  to

虽然我的功能看似正确：

>> myfun(df.loc[5])
  Val from  to
5  13   GE  VD
5  13   VS  GE

通过在两个子数据帧中过滤我的数据帧，我可以想到一种方法，一个行需要重复，另一个需要重复。 然后复制第一个数据帧，进行更改并将所有三个DF整理在一起。 但它很难看。 有谁能建议更优雅的方式？

换句话说，应用函数可以返回一个DataFrame，就像在R中我们会用ddply做的那样吗？

谢谢

Answer 1

过滤：

In [153]: sub = df[(~df[['From', 'To']].isin(['GE'])).all(1)]

In [154]: sub
Out[154]: 
  From  To   Val
2   VS  VD  1500
5   VD  VS  1300

[2 rows x 3 columns]


In [179]: good = df.ix[df.index - sub.index]

In [180]: good
Out[180]: 
  From  To   Val
0   GE  VD  1000
1   GE  VS  1600
3   VS  GE   600
4   VD  GE  1200

[4 rows x 3 columns]

定义一个函数，将所需的值作为DataFrame提供：

def new_df(row):
    return pd.DataFrame({"From": ["GE", row["From"]],
                         "To": [row["To"], "GE"],
                         "Val": [row["Val"], row["Val"]]})

将该函数应用于行：

In [181]: new = pd.concat([new_df(y) for _, y in x.iterrows()], axis=0, ignore_index=True)

In [182]: new
Out[182]: 
  From  To   Val
0   GE  VD  1500
1   VS  GE  1500
2   GE  VS  1300
3   VD  GE  1300

[4 rows x 3 columns]

并且连在一起

In [183]: pd.concat([good, new], axis=0, ignore_index=True)
Out[183]: 
  From  To   Val
0   GE  VD  1000
1   GE  VS  1600
2   VS  GE   600
3   VD  GE  1200
4   GE  VD  1500
5   VS  GE  1500
6   GE  VS  1300
7   VD  GE  1300

[8 rows x 3 columns]

Answer 2

这使用两次通过。 如果添加了一个else条件来连接将保持不变的行，则可以缩短它。 但是，我发现这更具可读性，因为我们使用itertuples来遍历行，这里的成本是线性的，我们只是根据需要形成每个元组（不是所有行的元组的大列表）。

类似地，您可以在if语句中弹出一行，并将其位置中的两个新行连接回原始数据对象df ，这样就不会产生创建keeper_rows的内存成本。 除非DataFrame是巨大的，否则为这样的任务进行这些优化通常是不值得的。

keeper_rows = df.ix[[i for i,x in enumerate(df.itertuples()) if 'GE' in x[0:2]]]
for row_as_tuple in df.itertuples():
    from_other, to_other, val = row_as_tuple
    if "GE" not in (from_other, to_other):
        new_rows = {"From":["GE", from_other], 
                    "To"  :[to_other, "GE"], 
                    "Val" :[val, val]}
        keeper_rows = pandas.concat([keeper_rows, pandas.DataFrame(new_rows)], 
                                    ignore_index=True)

复制一些行并更改pandas中的某些值

问题描述

2 个解决方案

解决方案1
5 2014-01-13 15:00:07

解决方案2
1 已采纳 2014-01-13 14:45:17

复制一些行并更改pandas中的某些值

问题描述

2 个解决方案

解决方案1 5 2014-01-13 15:00:07

解决方案2 1 已采纳 2014-01-13 14:45:17

解决方案1
5 2014-01-13 15:00:07

解决方案2
1 已采纳 2014-01-13 14:45:17