简体   繁体   中英

Splitting a dataframe with python

What I want to do is pretty simple, in other languages. I want to split a table, using a "for" loop to split a data frame every fifth row.

The idea is that I have dataframe that adds a new row, every so often, like answering a form with different questions and every answer is added to a specific column, like Google Forms with SpreadSheet.

What I have tried is the following:

import pandas as pd
dp=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
df1=pd.DataFrame(data=dp)
for i in range(0, len(dp)):
   if i%5==0:
      df = df1.iloc[i,:]
      print(df)          
print(df)

Which I know isn't much but nevertheless it is a try. Now, what I can't do is create a new variable with the new dataframe every time the loop reaches the i mod 5 == 0 row.

numpy.split

lod = np.split(df1, np.arange(1, 16, 5))

print(*lod, sep='\n\n')

   0
0  0

   0
1  1
2  2
3  3
4  4
5  5

     0
6    6
7    7
8    8
9    9
10  10

     0
11  11
12  12
13  13
14  14
15  15

lod = np.split(df1, np.arange(0, 16, 5)[1:])

print(*lod, sep='\n\n')

   0
0  0
1  1
2  2
3  3
4  4

   0
5  5
6  6
7  7
8  8
9  9

     0
10  10
11  11
12  12
13  13
14  14

     0
15  15

I think you're trying to convert a flat list into rows and columns using a known number of fields.

I'd do something like this:

import numpy as np
import pandas as pd

numFields = 3   # this is five in your case
fieldNames = ['color', 'animal', 'amphibian'] # totally optional 

# this is your 'dp'
inputData = ['brown', 'dog','false','green', 'toad','true']

flatDataArray = np.asarray(inputData)

reshapedData = flatDataArray.reshape(-1, numFields)

df = pd.DataFrame(reshapedData, columns=fieldNames) # you only need 'columns' if you want to name fields

print(df)

which gives:

    color   animal  amphibian
0   brown   dog     false
1   green   toad    true

--UPDATE--

From your comment above, I see that you'd like an arbitrary number of dataframes- one for each five-row group. Why not create a list of dataframes (ie so you have dfs[0] , dfs[1] )?

# continuing with from where the previous code left off...

dfs = []

for group in reshapedData:
     dfs.append(pd.DataFrame(group))

for df in dfs:
    print(df)

which prints:

   0
0  brown
1    dog
2  false

   0
0  green
1   toad
2   true

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM