[英]Pandas pd.merge “TypeError: string indices must be integers, not str”
I have researched this simple problem extensively but can't find an answer. 我已经广泛研究了这个简单的问题,但找不到答案。 I am trying to merge two files using pandas'
pd.merge
based on a common column named "JN". 我试图基于一个名为“ JN”的公共列,使用pandas的
pd.merge
合并两个文件。 I believe it is treating my 'joined' ( os.path.join
) filename as a string instead of a dataframe/csv file. 我相信它将“连接”(
os.path.join
)文件名视为字符串而不是dataframe / csv文件。 After I call the pd.merge
function the error says "string indices must be integers, not str". 在我调用
pd.merge
函数后,错误提示“字符串索引必须是整数,而不是str”。
import pandas as pd
import os
path = r"C:/Users/St/Documents/House/m2"
dirs = os.listdir(path)
for file in dirs:
if file.endswith("J.csv"):
J = file
if len(J) is 12: #some filenames are 12 chars others 11
jroot = J[:7]
else:
jroot = J[:6]
for file in dirs:
if file.endswith("2.csv"):
W = file
if len(W) is 12:
root2 = W[:7]
else:
root2 = W[:6]
JJ = os.path.join(path, J)
WW = os.path.join(path, W)
if jroot == root2: # if the first 7 (or 6) characters match, then merge
JW = pd.merge(JJ, WW, on="JN")
In associated with the above pd.merge function call, I am getting this error: 与上面的pd.merge函数调用相关联,我收到此错误:
TypeError: string indices must be integers, not str
I am wondering how to make it read my filename string as an actual file or dataframe. 我想知道如何使其读取我的文件名字符串作为实际文件或数据帧。 JJ and WW are the equivalent to full paths when printed out.
当打印时,JJ和WW等效于完整路径。 I tried make these 'filenames' dataFrames using
pd.DataFrame
but wasn't able to do so. 我尝试使用
pd.DataFrame
制作这些“文件名” pd.DataFrame
但无法这样做。
You cannot merge
two strings. 您不能
merge
两个字符串。 I think you're confused about what os.path.join
returns. 我认为您对
os.path.join
返回的结果感到困惑。 It returns a string. 它返回一个字符串。 You have to actually read in the
DataFrame
s from the files named JJ
and WW
, then perform the merge
. 您实际上必须从名为
JJ
和WW
的文件中读取DataFrame
,然后执行merge
。
Here's a full example of writing 2 DataFrame
s, reading them back with read_csv
and then merging them on a column group
: 这是编写2个
DataFrame
,使用read_csv
读回read_csv
,然后将它们合并到一个列group
的完整示例:
In [49]: df1 = DataFrame(randn(10, 1), columns=['a'])
In [50]: df1['group'] = np.random.choice(['b', 'c'], size=len(df1))
In [51]: df2 = DataFrame(randn(10, 1), columns=['b'])
In [52]: df2['group'] = np.random.choice(['b', 'c'], size=len(df1))
In [53]: df1.to_csv('df1.csv', index=False)
In [54]: cat df1.csv
a,group
-1.590035935931282,b
0.5496398501891229,c
-0.6484689548035797,b
0.19162302248253205,b
-0.9852064283582675,c
0.5975155551821989,b
0.29443634291217047,b
-0.7929994157215382,b
-1.9546460886048795,b
0.19195457928475546,c
In [55]: df2.to_csv('df2.csv', index=False)
In [56]: cat df2.csv
b,group
-1.2874060006117918,c
1.1037959548210117,b
0.47172389260467507,c
0.12802538607490285,c
-0.8753708425917293,b
-0.09187827793091947,b
1.140204215271196,c
0.4862940170888638,b
-1.1080430563137758,b
-1.3698112665693232,c
In [57]: df1_csv = read_csv('df1.csv', index_col=None)
In [58]: df2_csv = read_csv('df2.csv', index_col=None)
In [59]: df1_csv
Out[59]:
a group
0 -1.590 b
1 0.550 c
2 -0.648 b
3 0.192 b
4 -0.985 c
5 0.598 b
6 0.294 b
7 -0.793 b
8 -1.955 b
9 0.192 c
In [60]: df2_csv
Out[60]:
b group
0 -1.287 c
1 1.104 b
2 0.472 c
3 0.128 c
4 -0.875 b
5 -0.092 b
6 1.140 c
7 0.486 b
8 -1.108 b
9 -1.370 c
In [61]: df3 = pd.merge(df1_csv, df2_csv, on='group')
In [62]: df3
Out[62]:
a group b
0 -1.590 b 1.104
1 -1.590 b -0.875
2 -1.590 b -0.092
3 -1.590 b 0.486
4 -1.590 b -1.108
5 -0.648 b 1.104
6 -0.648 b -0.875
7 -0.648 b -0.092
8 -0.648 b 0.486
9 -0.648 b -1.108
10 0.192 b 1.104
11 0.192 b -0.875
12 0.192 b -0.092
13 0.192 b 0.486
14 0.192 b -1.108
15 0.598 b 1.104
16 0.598 b -0.875
17 0.598 b -0.092
18 0.598 b 0.486
19 0.598 b -1.108
20 0.294 b 1.104
21 0.294 b -0.875
22 0.294 b -0.092
23 0.294 b 0.486
24 0.294 b -1.108
25 -0.793 b 1.104
26 -0.793 b -0.875
27 -0.793 b -0.092
28 -0.793 b 0.486
29 -0.793 b -1.108
30 -1.955 b 1.104
31 -1.955 b -0.875
32 -1.955 b -0.092
33 -1.955 b 0.486
34 -1.955 b -1.108
35 0.550 c -1.287
36 0.550 c 0.472
37 0.550 c 0.128
38 0.550 c 1.140
39 0.550 c -1.370
40 -0.985 c -1.287
41 -0.985 c 0.472
42 -0.985 c 0.128
43 -0.985 c 1.140
44 -0.985 c -1.370
45 0.192 c -1.287
46 0.192 c 0.472
47 0.192 c 0.128
48 0.192 c 1.140
49 0.192 c -1.370
Couple of other things: 其他几件事:
Don't use is
to compare objects for equality, use ==
. 不要使用
is
比较对象是否相等,请使用==
。 Only in the case of small integers will this work reliably, and even then you shouldn't rely on it because that's an implementation detail of CPython. 只有在小整数的情况下,这才能可靠地工作,即使那样,您也不应该依赖它,因为这是CPython的实现细节。
Instead of checking the file name with str.endswith
, just iterate over what you want by first globbing: 无需使用
str.endswith
检查文件名,只需通过遍历遍历所需内容即可:
import glob
for f in glob.glob(os.path.join(path, '*J.csv')):
if len(f) == 12:
# do all the thingz!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.