区分大小写的熊猫系列匹配和干净的熊猫系列逻辑

Question

I have a pandas dataFrame of fruits:: 我有一个水果的熊猫数据框架::

df = pd.read_csv(newfile, header=None)
df
             0        1        2             3        4        5    6   7
0        Apple  Bananas      Fig    Elderberry   Cherry    Honeydew NaN NaN 
1      Bananas   Cherry   Dragon    Elderberry      NaN         NaN NaN NaN
2       Cherry    Grape      NaN           NaN      NaN         NaN NaN NaN
3       Dragon      NaN    Apple        Bananas  Cherry  Elderberry NaN NaN
4   Elderberry    Apple  Bananas            Fig   Grape         NaN NaN NaN
5          Fig   Cherry Honeydew          Apple     NaN         NaN NaN NaN
6        Grape      NaN      NaN            NaN     NaN         NaN NaN NaN
7     Honeydew    Grape      Fig     Elderberry  Dragon      Cherry Bananas Apple

And I'm trying to find "fruit pairings", eg in the first row, Apple and Fig are a pair, and 6th row Fig and Apple. 我正在尝试找到“水果配对”，例如在第一行中，Apple和Fig是一对，而在第六行Fig和Apple。 Likewise for Apple-Elderberry and Elderberry-Apple, but not Apple and Bananas (there are no Apples in the row starting with Bananas). 对于Apple-Elderberry和Elderberry-Apple，但不是Apple和Bananas（从Bananas开始的行中就没有苹果）。

I've got the following code working, and that does this:: 我有下面的代码工作，并且做到了：

fruits = df[0]
stock  = df.drop(0, axis=1)

for i in range(len(fruits)):
    string1 = str(fruits[i])
    full_line = (stock.iloc[i])
    line = np.array(full_line.dropna(axis=0))
    if len(line) > 0 : 
        for j in range(len(stock)):
            iind = (fruits[fruits == line[j]].index[0])
            this_line = stock.iloc[iind]
            logic_out = this_line.str.match(string1)
            print(logic_out)

BUT!! 但！！ (1) It breaks at the fruits == line[j] due the Pandas Series being case sensitive and (2) the boolean out put is a mixture of True's, Falses and NaNs. （1）由于Pandas系列区分大小写，因此在水果==行[j]处中断，并且（2）布尔输出是True，Falses和NaN的混合。 Ideally, I just want to count the Trues. 理想情况下，我只想计算真实情况。 Any and all help v. much appreciated!! 任何帮助v。万分感谢！

Answer 1

I'm going to use set logic, pandas stacking, and numpy broadcasting 我将使用集合逻辑，熊猫堆叠和numpy广播

f = lambda x: x.title() if isinstance(x, str) else x

s = df.applymap(f).set_index('0').rename_axis(None).stack().groupby(level=0).apply(set)

f = s.index
p = s.values

one_way = (p[:, None] & [{x} for x in f]).astype(bool)
[(f[i], f[j]) for i, j in zip(*np.where(one_way & one_way.T))]

[('Apple', 'Elderberry'),
 ('Apple', 'Fig'),
 ('Apple', 'Honeydew'),
 ('Bananas', 'Dragon'),
 ('Bananas', 'Elderberry'),
 ('Dragon', 'Bananas'),
 ('Elderberry', 'Apple'),
 ('Elderberry', 'Bananas'),
 ('Fig', 'Apple'),
 ('Fig', 'Honeydew'),
 ('Honeydew', 'Apple'),
 ('Honeydew', 'Fig')]

区分大小写的熊猫系列匹配和干净的熊猫系列逻辑

问题描述

1 个解决方案

解决方案1
1 2017-10-09 19:07:29

区分大小写的熊猫系列匹配和干净的熊猫系列逻辑

问题描述

1 个解决方案

解决方案1 1 2017-10-09 19:07:29

解决方案1
1 2017-10-09 19:07:29