[英]How to turn a series of Pandas dataframe rows into one column with multiple values?
Right now I have an Excel sheet in the following format which I have converted into a Pandas data frame in Python: 现在,我有以下格式的Excel工作表,已将其转换为Python中的Pandas数据框:
Name Column2 Unnamed: 2 Datatype Definition
0 Entity Accounts Receivable
1 term1 char term1
2 term2 numeric term2
3 term3 char term3
4 Entity Accounts Payable
5 term4 char term4
6 term5 char term5
7 term6 varchar term6
8 term7 numeric term7
I'm attempting to write a code that will automatically populate the empty cells in Column2 with the corresponding value for 'Entity' next to each term name. 我正在尝试编写一个代码,该代码将自动在Column2中的空单元格中填充每个术语名称旁边的“ Entity”对应值。 So term1, term2, and term3 would be 'Accounts Receivable' and term4, term5, term6, and term7 would be 'Accounts Payable'. 因此term1,term2和term3将是“应收帐款”,term4,term5,term6和term7将是“应付帐款”。
This is the code I've written so far: 这是我到目前为止编写的代码:
df = pd.read_excel('test.xlsx')
df = df.replace(np.nan,'')
values = df.values.tolist()
ent_list = []
for values[0] in values:
if values[0][0] == 'Entity':
ent_list.append(values[0][1])
for j in range(len(values)):
for e in range(len(ent_list)):
while values[j][1] != ent_list[e]:
values[j][1] = ent_list[e]
break
e += 1
When I print out 'values' though, I get the following: 当我打印出“值”时,得到以下信息:
[['Entity', 'Accounts Payable', '', '', ''],
['term1', 'Accounts Payable', '', 'char', 'term1'],
['term2', 'Accounts Payable', '', 'numeric', 'term2'],
['term3', 'Accounts Payable', '', 'char', 'term3'],
['Entity', 'Accounts Payable', '', '', ''],
['term4', 'Accounts Payable', '', 'char', 'term4'],
['term5', 'Accounts Payable', '', 'char', 'term5'],
['term6', 'Accounts Payable', '', 'varchar', 'term6'],
['term7', 'Accounts Payable', '', 'numeric', 'term7']]
Ideally it should look like this: 理想情况下,它应如下所示:
[['Entity', 'Accounts Receivable', '', '', ''],
['term1', 'Accounts Receivable', '', 'char', 'term1'],
['term2', 'Accounts Receivable', '', 'numeric', 'term2'],
['term3', 'Accounts Receivable', '', 'char', 'term3'],
['Entity', 'Accounts Payable', '', '', ''],
['term4', 'Accounts Payable', '', 'char', 'term4'],
['term5', 'Accounts Payable', '', 'char', 'term5'],
['term6', 'Accounts Payable', '', 'varchar', 'term6'],
['term7', 'Accounts Payable', '', 'numeric', 'term7']]
Is there a way to achieve this using the method I am currently using? 有没有一种方法可以使用我目前使用的方法来实现? I have to imagine this is possible with VBA but I'm honestly more comfortable using Python. 我必须想象使用VBA可以做到这一点,但是老实说,使用Python会让我更自在。 I'm going to keep revising this code but am genuinely stumped as I am not too experienced. 我将继续修改此代码,但是由于我不太有经验,所以我真的很沮丧。
I know I could do it manually but that will take too long as these reports need to generated every so often and usually include between 40,000 and 70,000 rows, and I would much prefer to automate this. 我知道我可以手动执行此操作,但这会花费很长时间,因为这些报告需要经常生成,并且通常包含40,000至70,000行,因此,我更希望将其自动化。
df = df.fillna(method = 'ffill')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.