[英]Case sensitive Pandas Series matching and clean Panda Series Logic
I have a pandas dataFrame of fruits:: 我有一个水果的熊猫数据框架::
df = pd.read_csv(newfile, header=None)
df
0 1 2 3 4 5 6 7
0 Apple Bananas Fig Elderberry Cherry Honeydew NaN NaN
1 Bananas Cherry Dragon Elderberry NaN NaN NaN NaN
2 Cherry Grape NaN NaN NaN NaN NaN NaN
3 Dragon NaN Apple Bananas Cherry Elderberry NaN NaN
4 Elderberry Apple Bananas Fig Grape NaN NaN NaN
5 Fig Cherry Honeydew Apple NaN NaN NaN NaN
6 Grape NaN NaN NaN NaN NaN NaN NaN
7 Honeydew Grape Fig Elderberry Dragon Cherry Bananas Apple
And I'm trying to find "fruit pairings", eg in the first row, Apple and Fig are a pair, and 6th row Fig and Apple. 我正在尝试找到“水果配对”,例如在第一行中,Apple和Fig是一对,而在第六行Fig和Apple。 Likewise for Apple-Elderberry and Elderberry-Apple, but not Apple and Bananas (there are no Apples in the row starting with Bananas). 对于Apple-Elderberry和Elderberry-Apple,但不是Apple和Bananas(从Bananas开始的行中就没有苹果)。
I've got the following code working, and that does this:: 我有下面的代码工作,并且做到了:
fruits = df[0]
stock = df.drop(0, axis=1)
for i in range(len(fruits)):
string1 = str(fruits[i])
full_line = (stock.iloc[i])
line = np.array(full_line.dropna(axis=0))
if len(line) > 0 :
for j in range(len(stock)):
iind = (fruits[fruits == line[j]].index[0])
this_line = stock.iloc[iind]
logic_out = this_line.str.match(string1)
print(logic_out)
BUT!! 但!! (1) It breaks at the fruits == line[j] due the Pandas Series being case sensitive and (2) the boolean out put is a mixture of True's, Falses and NaNs. (1)由于Pandas系列区分大小写,因此在水果==行[j]处中断,并且(2)布尔输出是True,Falses和NaN的混合。 Ideally, I just want to count the Trues. 理想情况下,我只想计算真实情况。 Any and all help v. much appreciated!! 任何帮助v。万分感谢!
I'm going to use set logic, pandas stacking, and numpy broadcasting 我将使用集合逻辑,熊猫堆叠和numpy广播
f = lambda x: x.title() if isinstance(x, str) else x
s = df.applymap(f).set_index('0').rename_axis(None).stack().groupby(level=0).apply(set)
f = s.index
p = s.values
one_way = (p[:, None] & [{x} for x in f]).astype(bool)
[(f[i], f[j]) for i, j in zip(*np.where(one_way & one_way.T))]
[('Apple', 'Elderberry'),
('Apple', 'Fig'),
('Apple', 'Honeydew'),
('Bananas', 'Dragon'),
('Bananas', 'Elderberry'),
('Dragon', 'Bananas'),
('Elderberry', 'Apple'),
('Elderberry', 'Bananas'),
('Fig', 'Apple'),
('Fig', 'Honeydew'),
('Honeydew', 'Apple'),
('Honeydew', 'Fig')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.