[英]Moving data from specific columns to a different column in a new row
I have a csv containing details of documents located on Google Drive.我有一个包含 Google Drive 上文档详细信息的 csv。 I am trying to make it easier to read and deal with as this example has over 400 columns.我试图让它更容易阅读和处理,因为这个例子有 400 多列。
Each row in the csv represents a file on Google drive. csv 中的每一行代表 Google 驱动器上的一个文件。 There are multiple columns to denote who owns the file and who it is shared with.有多个列来表示谁拥有该文件以及与谁共享该文件。
Every time a file has been shared, the details of the person it has been shared with are appended as a new column to that row.每次共享文件时,共享对象的详细信息都会作为新列附加到该行。
I have loaded the data into Pandas data frame and I'm struggling to move the contents of certain columns to a new row.我已将数据加载到 Pandas 数据框中,并且正在努力将某些列的内容移动到新行。
Below is an example下面是一个例子
Input:
owner | id | title | permissions.0.name | permissions.0.email | permsissions.1.name | permissions.1.email
value 1 doc1 Tommy tommy@office.com Timmy timmy@office.com
value 2 doc2 Tommy tommy@office.com
value 3 doc3 Timmy timmy@office.com
value 4 doc4 Tammy tammy@office.com Tommy tommy@office.com
Output:
owner | id | title | permissions.0.name | permissions.0.email
value 1 doc1 Tommy tommy@office.com
value 2 doc2 Tommy tommy@office.com
value 3 doc3 Timmy timmy@office.com
value 4 doc4 Tammy tammy@office.com
value 5 doc1 Timmy timmy@office.com
value 6 doc4 Tommy tommy@office.com
I began by creating a list of and finding out the maximum number in the column headings (it is 46 in the full data).我首先创建一个列表并找出列标题中的最大数字(在完整数据中为 46)。 Then loop through from 1 to 46 building the column name to look at and moving the contents from that column to a different column on a new row.然后从 1 到 46 循环构建列名称以查看并将内容从该列移动到新行上的不同列。 But I had no idea how to move the contents...但我不知道如何移动内容......
import pandas
df = pandas.read_csv(input.csv)
cols = list(df) #list of column names
maxcol =[]
for c in cols:
if '.' in c:
n = c.split('.')[1]
maxcol.append(int(n))
maxval = max(maxcol)
for i in range(1 to maxval):
colname = 'permissions.' + str(i) + '.name'
# move contents from this column to permissions.0.name in new row somehow
There are many more columns (over 400) and do not appear in an organised structure.还有更多的列(超过 400 个)并且没有出现在有组织的结构中。 For example columns are created when required.例如,在需要时创建列。 So we have columns like this:所以我们有这样的列:
permissions.5.email | permissions.1.withPhoto | permissions.6.name | permission.6.email
You can use this code.您可以使用此代码。 I think there are easier ways to do this, but it seems to be giving the required result:我认为有更简单的方法可以做到这一点,但它似乎给出了所需的结果:
cols = list(df)
# Get the variable column names
per_cols = [c for c in cols if '.' in c]
# Get the constant column names, i.e. title
main_cols = [c for c in cols if '.' not in c]
# Get the set of numbers present in columns
col_no = set([c.split('.')[1] for c in per_cols])
result = pd.DataFrame()
for col in col_no:
# Get the list of columns with the current number
part = [c for c in per_cols if c.split('.')[1] == col]
# Add the constant columns to get a complete df
part = main_cols + part
temp = df[part]
# Change the name of columns to '0' for unified result
temp.columns = [c.replace(col, '0') for c in temp.columns]
# Drop the NaN rows (preferrably use a subset you are certain wouldn't be null)
temp = temp.dropna()
# Append the chunck of df to result
# If the chunck has a new column, it will be added to result df
result = result.append(temp)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.