[英]Pandas Dataframe Encoding Vector from ManyToMany Join Table in Sqlite3
So I have three tables (A, A_to_B, B), one of which is a join table for many->many relationships... I need to create a dataframe that contains an array of flags for each id in B (whether or not a transaction in the join table exists), for each row in A... Very hard to explain, but below are the example tables...所以我有三个表(A,A_to_B,B),其中一个是多->多关系的连接表...我需要创建一个 dataframe,其中包含 B 中每个 id 的标志数组(无论是否连接表中存在一个事务),对于 A 中的每一行......很难解释,但下面是示例表......
A_to_B A B
+------+------+ +------+------+ +------+------+
| id_a | id_b | | id | val | | id | val |
+------+------+ +------+------+ +------+------+
| 1 | 2 | | 1 | foo | | 1 | foob |
+------+------+ +------+------+ +------+------+
| 1 | 3 | | 2 | bar | | 2 | barb |
+------+------+ +------+------+ +------+------+
| 2 | 3 | | 3 | baz | | 3 | bazb |
+------+------+ +------+------+ +------+------+
And I want to end up with a dataframe that looks like this我想最终得到一个看起来像这样的 dataframe
1 2 3
_______________________
1 0 1 1 # id 1 from table A contains entries for ids 2/3 in B
2 0 0 1 # id 2 from table A contains entry for id 3 in B
3 0 0 0 # id 3 contains no transactions in the join table
Hopefully that makes sense.希望这是有道理的。 Also, keep in mind that this has to be an efficient sqlite query as I am dealing with potentially tens of thousands of rows from each table.另外,请记住,这必须是一个高效的 sqlite 查询,因为我正在处理每个表中可能有数万行。
I have each table loaded into a separate dataframe, as follows (but of course that is by no means a constraint on the solution to this).我将每个表加载到单独的 dataframe 中,如下所示(但当然这绝不是对此解决方案的限制)。
import pandas as pd
import sqlite3
conn = sqlite3.connect('database.sqlite3')
cur = conn.cursor()
df_A = pd.read_sql_query('SELECT * FROM A', conn)
df_B = pd.read_sql_query('SELECT * FROM B', conn)
df_A_to_B = pd.read_sql_query('SELECT * FROM A_to_B', conn)
# input
df = pd.DataFrame({'A':[1,1,2], 'B':[2,3,3]})
dfa = pd.DataFrame({'A':[1,2,3], 'tt':['f','b','z']})
dfb = pd.DataFrame({'B':[1,2,3], 'tt':['fb','bb','zb']})
# output
a = pd.Categorical(df['A'], categories=dfa['A'])
b = pd.Categorical(df['B'], categories=dfb['B'])
pd.crosstab(a, b, dropna=False, rownames=['A'], colnames=['B'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.