简体   繁体   English

Python - 拆分列表中包含的数据帧

[英]Python - split dataframes that are contained within a list

I am extracting tables from a PDF and the page can have more than 1 table.我正在从 PDF 中提取表格,并且该页面可以有多个表格。 I am using the Tika library for extraction.我正在使用 Tika 库进行提取。 In this case, the output is 2 data frames (so the 2 tables) contained within a list - please could someone share how I can extract each dataframe out?在这种情况下,output 是包含在一个列表中的 2 个数据帧(所以 2 个表) - 请有人分享我如何提取每个 Z6A8064B5DF4794555500553C47C55057DZ 出来?

For context each df has 2 columns and the same number of rows.对于上下文,每个 df 有 2 列和相同的行数。

Example:例子:

[0   data1  
1    data2
2    data3  
3    data4

0   data10
1   data12
2   data13
3   data14 ]

I want to extract the first df here so:我想在这里提取第一个df:

0    data1  
1    data2
2    data3  
3    data4

I have tried to parse like this:我试图这样解析:

df[:3] or df[-1] 

Please could someone share where I am going wrong?请问有人可以分享我哪里出错了吗?

You should be able to index into the list if it is a list.如果它是一个列表,您应该能够索引到列表中。 Double check the types and the hierarchy of how the data is stored.仔细检查数据存储方式的类型和层次结构。 Without you showing the error, it's hard to tell what your problem is.如果没有您显示错误,就很难说出您的问题是什么。 However, you can do this for example:但是,您可以这样做,例如:

df = pd.DataFrame({'a': ['gg', 'bb'], 'h': ['ttt', 'sdf']})

list_of_dfs = [df, df]

# get the first dataframe:
list_of_dfs[0]

# If you are trying to combine them into one table (assuming they are of the same form):
df_all = pd.concat(list_of_dfs)

df1 = df.head(4)

will get the first 4 items将获得前 4 项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM