如何一次读取熊猫数据框的两行两列，并在这些行/列值上应用条件？

Question

我想一次在pandas Dataframe中读取两行两列，然后在pandas Dataframe的两行/列矩阵之间应用条件相关的zip vs. product字符串pandas Dataframe 。

import pandas as pd
import itertools as it
from itertools import product

cond_mcve = pd.read_csv('condition01.mcve.txt', sep='\t')

  alfa  alfa_index beta  beta_index delta  delta_index
0  a,b          23  c,d          36   a,c           32
1  a,c          23  b,e          37   c,d           32
2  g,h          28  d,f          37   e,g           32
3  a,b          28  c,d          39   a,c           34
4  c,e          28  b,g          39   d,k           34

这里的alfa，beta和delta是字符串值，它们具有自己的对应索引。
我要创建两个相邻字符串（按行）的zip ，如果它们具有相同的索引值。 因此，对于alfa column的前两行，输出应为aa,cb因为这两行的alfa_index为23 。
但是，对于alfa列的第二行和第三行，两个索引值不同（23和28），因此，输出应为字符串的乘积，即输出：ga，gc，ha，hc

这是我在做事时所想到的：而且，我希望我能清楚地解释这个问题。

# write a function
def some_function():
    read_two columns at once (based on prefix similarity)

    then:
    if two integer_index are same:
        zip(of strings belonging to that index)

    if two integer index are different:
        product(of strings belonging to that index)

# take this function and apply it to pandas dataframe:
cond_mcve_updated = cond_mcve+cond_mcve.shift(1).dropna(how='all').applymap(some_function)

这里shift可以一次读取两行，因此解决了我一次读取两行的问题。 但是，在阅读两栏和条件的实现时，我还有其他问题：

一次读取pandas数据框中的两列（基于前缀相似性）。
分隔这些列以比较索引值（整数）
根据条件使用拉链或产品

预期的最终输出将是：

   alfa          alfa_index    beta             beta_index    delta  delta_index
1  aa,cb         23            bc,bd,ec,ed      37            ca,dc           32
2  ga,gc,ha,hc   28            db,fe            37            ec,gd           32
same for other line.....

# the first index(i.e 0 is lost) but that's ok. I can work it out using `head/tail` method in pandas.

Answer 1

这是获得结果的一种方法。 该函数使用shift ， concat ，并apply将数据运行到一个函数中，该函数可以根据匹配的_index值执行生产/求和操作。

码：

import itertools as it

def crazy_prod_sum_thing(frame):
    # get the labels which do not end with _index
    labels = [(l, l + '_index')
              for l in frame.columns.values if not l.endswith('_index')]

    def func(row):
        # get row n and row n-1
        front = row[:len(row) >> 1]
        back = row[len(row) >> 1:]

        # loop through the labels
        results = []
        for l, i in labels:
            x = front[l].split(',')
            y = back[l].split(',')
            if front[i] == back[i]:
                results.append(x[0] + y[0] + ',' + x[1] + x[1])
            else:
                results.append(
                    ','.join([x1 + y1 for x1, y1 in it.product(x, y)]))

        return pd.Series(results)

    # take this function and apply it to pandas dataframe:
    df = pd.concat([frame, frame.shift(1)], axis=1)[1:].apply(
        func, axis=1)

    df.rename(columns={i: x[0] + '_cpst' for i, x in enumerate(labels)},
              inplace=True)
    return pd.concat([frame, df], axis=1)

测试代码：

import pandas as pd
from io import StringIO
data = [x.strip() for x in """
      alfa  alfa_index beta  beta_index delta  delta_index
    0  a,b          23  c,d          36   a,c           32
    1  a,c          23  b,e          37   c,d           32
    2  g,h          28  d,f          37   e,g           32
    3  a,b          28  c,d          39   a,c           34
    4  c,e          28  b,g          39   d,k           34
""".split('\n')[1:-1]]
df = pd.read_csv(StringIO(u'\n'.join(data)), sep='\s+')
print(df)

print(crazy_prod_sum_thing(df))

结果：

  alfa  alfa_index beta  beta_index delta  delta_index
0  a,b          23  c,d          36   a,c           32
1  a,c          23  b,e          37   c,d           32
2  g,h          28  d,f          37   e,g           32
3  a,b          28  c,d          39   a,c           34
4  c,e          28  b,g          39   d,k           34

1          [aa,cc, bc,bd,ec,ed, ca,dd]
2          [ga,gc,ha,hc, db,ff, ec,gg]
3    [ag,bb, cd,cf,dd,df, ae,ag,ce,cg]
4                [ca,ee, bc,gg, da,kk]

注意：

这不会按照问题所示将结果封送回数据帧，因为我不确定当索引值不匹配时如何获取索引值。

如何一次读取熊猫数据框的两行两列，并在这些行/列值上应用条件？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-03-15 20:26:01

如何一次读取熊猫数据框的两行两列，并在这些行/列值上应用条件？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-03-15 20:26:01

解决方案1
1 已采纳 2017-03-15 20:26:01